The number of splits is equal to the number of regions...
On Sun, Apr 11, 2010 at 12:54 AM, john smith <js1987.sm...@gmail.com> wrote: > Hi , > > In the method "public org.apache.hadoop.mapred.InputSplit[] *getSplits* > (org.apache.hadoop.mapred.JobConf job, > > int numSplits) " > > how is the "numSplits" decided ? I've seen differnt values of > numSplits for different MR jobs . Any reason for this ? > > Also what if I ignore numsplits and always split at region > boundaries.I guess that , splitting at region boundaries makes more > sense and improves some what data locality. > > Any comments on the above statement? > > Thanks > > j.S >