Robert, On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:
> Hi, > > I have some questions related to basic functionality in Hadoop. > > 1. When a Mapper process the intermediate output data, how it knows how many > partitions to do(how many reducers will be) and how much data to go in each > partition for each reducer ? > > 2. A JobTracker when assigns a task to a reducer, it will also specify the > locations of intermediate output data where it should retrieve it right ? But > how a reducer will know from each remote location with intermediate output > what portion it has to retrieve only ? To add to Harsh's comment. Essentially the TT *knows* where the output of a given map-id/reduce-id pair is present via an output-file/index-file combination. Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/