Hi , In the method "public org.apache.hadoop.mapred.InputSplit[] *getSplits* (org.apache.hadoop.mapred.JobConf job,
int numSplits) "
how is the "numSplits" decided ? I've seen differnt values of
numSplits for different MR jobs . Any reason for this ?
Also what if I ignore numsplits and always split at region
boundaries.I guess that , splitting at region boundaries makes more
sense and improves some what data locality.
Any comments on the above statement?
Thanks
j.S
