Amandeep, I guess that is not true ,.. See the explanation as in docs ..
"Splits are created in number equal to the smallest between numSplits and the number of HRegion<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html>s in the table. If the number of splits is smaller than the number of HRegion<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html>s then splits are spanned across multiple HRegion<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html>s and are grouped the most evenly possible. In the case splits are uneven the bigger splits are placed first in the InputSplit array. " depending on whether numSplits < (or >) num of regions .. it choses real number of splits and the same is done in the code // Code int realNumSplits = numSplits > startKeys.length? startKeys.length: numSplits; Here startKeys.length is the number of regions... Am I true? Thanks j.S On Sun, Apr 11, 2010 at 1:33 PM, Amandeep Khurana <ama...@gmail.com> wrote: > The number of splits is equal to the number of regions... > > > > On Sun, Apr 11, 2010 at 12:54 AM, john smith <js1987.sm...@gmail.com> > wrote: > > > Hi , > > > > In the method "public org.apache.hadoop.mapred.InputSplit[] *getSplits* > > (org.apache.hadoop.mapred.JobConf job, > > > > int numSplits) " > > > > how is the "numSplits" decided ? I've seen differnt values of > > numSplits for different MR jobs . Any reason for this ? > > > > Also what if I ignore numsplits and always split at region > > boundaries.I guess that , splitting at region boundaries makes more > > sense and improves some what data locality. > > > > Any comments on the above statement? > > > > Thanks > > > > j.S > > >