Hi, It would be more helpful, If you could more details for the below doubts.
1, How the partitioner knows which reducer needs to be called? 2, When we are using more than one reducers, the output gets separated. Actually for what scenario we have to go for multiple reducers? Cheers! Manoj. On Mon, Jul 9, 2012 at 6:54 PM, Arun C Murthy <a...@hortonworks.com> wrote: > Robert, > > On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote: > > Hi, > > I have some questions related to basic functionality in Hadoop. > > 1. When a Mapper process the intermediate output data, how it knows how > many partitions to do(how many reducers will be) and how much data to go in > each partition for each reducer ? > > 2. A JobTracker when assigns a task to a reducer, it will also specify the > locations of intermediate output data where it should retrieve it right ? > But how a reducer will know from each remote location with intermediate > output what portion it has to retrieve only ? > > > To add to Harsh's comment. Essentially the TT *knows* where the output of > a given map-id/reduce-id pair is present via an output-file/index-file > combination. > > Arun > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > >