A reducer gets all values for a particular key. So, I think it'll just waste the extra reducers. They won't get any input records.
On 8/20/09, john smith <[email protected]> wrote: > Thanks for all your replies guys ,.As bharath said , what is the case when > number of reducers becomes more than number of distinct Map key outputs? > > On Fri, Aug 21, 2009 at 9:39 AM, bharath vissapragada < > [email protected]> wrote: > >> Aamandeep , Gray and Purtell thanks for your replies .. I have found them >> very useful. >> >> You said to increase the number of reduce tasks . Suppose the number of >> reduce tasks is more than number of distinct map output keys , some of the >> reduce processes may go waste ? is that the case? >> >> Also I have one more doubt ..I have 5 values for a corresponding key on >> one >> region and other 2 values on 2 different region servers. >> Does hadoop Map reduce take care of moving these 2 diff values to the >> region >> with 5 values instead of moving those 5 values to other system to minimize >> the dataflow? Is this what is happening inside ? >> >> On Fri, Aug 21, 2009 at 9:03 AM, Andrew Purtell <[email protected]> >> wrote: >> >> > The behavior of TableInputFormat is to schedule one mapper for every >> table >> > region. >> > >> > In addition to what others have said already, if your reducer is doing >> > little more than storing data back into HBase (via TableOutputFormat), >> then >> > you can consider writing results back to HBase directly from the mapper >> to >> > avoid incurring the overhead of sort/shuffle/merge which happens within >> the >> > Hadoop job framework as map outputs are input into reducers. For that >> type >> > of use case -- using the Hadoop mapreduce subsystem as essentially a >> > grid >> > scheduler -- something like job.setNumReducers(0) will do the trick. >> > >> > Best regards, >> > >> > - Andy >> > >> > >> > >> > >> > ________________________________ >> > From: john smith <[email protected]> >> > To: [email protected] >> > Sent: Friday, August 21, 2009 12:42:36 AM >> > Subject: Doubt in HBase >> > >> > Hi all , >> > >> > I have one small doubt . Kindly answer it even if it sounds silly. >> > >> > Iam using Map Reduce in HBase in distributed mode . I have a table >> > which >> > spans across 5 region servers . I am using TableInputFormat to read the >> > data >> > from the tables in the map . When i run the program , by default how >> > many >> > map regions are created ? Is it one per region server or more ? >> > >> > Also after the map task is over.. reduce task is taking a bit more time >> > . >> > Is >> > it due to moving the map output across the regionservers? i.e, moving >> > the >> > values of same key to a particular reduce phase to start the reducer? Is >> > there any way i can optimize the code (e.g. by storing data of same >> reducer >> > nearby ) >> > >> > Thanks :) >> > >> > >> > >> > >> > -- Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
