Hi Manoj, As Harsh said, we would almost always need multiple reducers. As each reduce is potentially executed on a different core (same machine or a different one), in most cases, we would want at least as many reduces as the number of cores for maximum parallelism/performance.
Karthik On Mon, Jul 9, 2012 at 11:07 AM, Manoj Babu <manoj...@gmail.com> wrote: > Hi Harsh, > > Thanks for clarifying. I was in thought earlier that Partitioner is > picking the reducer. > > My cluster setup provides options for multiple reducers so i want to know > when and in which scenario we have go for multiple reducers? > > Cheers! > Manoj. > > > > On Mon, Jul 9, 2012 at 11:27 PM, Harsh J <ha...@cloudera.com> wrote: > >> Manoj, >> >> Think of it this way, and you shouldn't be confused: A reducer == a >> partition. >> >> For (1) - Partitioners do not 'call' a reduce, just write the data >> with a proper partition ID. The reducer thats same as the partition >> ID, picks it up for itself later. This we have already explained >> earlier. >> >> For (2) - For what scenario do you _not_ want multiple reducers >> handling each partition uniquely, when it is possible to scale that >> way? >> >> On Mon, Jul 9, 2012 at 11:22 PM, Manoj Babu <manoj...@gmail.com> wrote: >> > Hi, >> > >> > It would be more helpful, If you could more details for the below >> doubts. >> > >> > 1, How the partitioner knows which reducer needs to be called? >> > 2, When we are using more than one reducers, the output gets separated. >> > Actually for what scenario we have to go for multiple reducers? >> > >> > Cheers! >> > Manoj. >> > >> > >> > >> > On Mon, Jul 9, 2012 at 6:54 PM, Arun C Murthy <a...@hortonworks.com> >> wrote: >> >> >> >> Robert, >> >> >> >> On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote: >> >> >> >> Hi, >> >> >> >> I have some questions related to basic functionality in Hadoop. >> >> >> >> 1. When a Mapper process the intermediate output data, how it knows how >> >> many partitions to do(how many reducers will be) and how much data to >> go in >> >> each partition for each reducer ? >> >> >> >> 2. A JobTracker when assigns a task to a reducer, it will also specify >> the >> >> locations of intermediate output data where it should retrieve it >> right ? >> >> But how a reducer will know from each remote location with intermediate >> >> output what portion it has to retrieve only ? >> >> >> >> >> >> To add to Harsh's comment. Essentially the TT *knows* where the output >> of >> >> a given map-id/reduce-id pair is present via an output-file/index-file >> >> combination. >> >> >> >> Arun >> >> >> >> -- >> >> Arun C. Murthy >> >> Hortonworks Inc. >> >> http://hortonworks.com/ >> >> >> >> >> > >> >> >> >> -- >> Harsh J >> > >