Ah, well, my bad. See instead the description for mapred.reduce.tasks in mapred-default.xml, which states this: "Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave."
FWIW, I set it manually to the level of parallelism I require (given my partitioned data, etc.). On Tue, Aug 28, 2012 at 8:43 PM, abhiTowson cal <abhishek.dod...@gmail.com> wrote: > hi harsh, > > Thanks for the reply.I get your first and second points and coming to > third point how is it specific to a job? > My question was specific to job. > > Regards > Abhishek > > > > On Mon, Aug 27, 2012 at 11:29 PM, Harsh J <ha...@cloudera.com> wrote: >> Hi, >> >> On Tue, Aug 28, 2012 at 8:32 AM, Abhishek <abhishek.dod...@gmail.com> wrote: >>> Hi all, >>> >>> I just want to know that, based on what factor map reduce framework decides >>> number of reducers to launch for a job >> >> The framework does not auto-determine the number of reducers for a >> job. That is purely user-or-client-program-supplied presently. >> >>> By default only one reducer will be launched for a given job is this right? >>> If we explicitly does not mention number to launch via command line or >>> driver class. >> >> Yes, by default the number of reduce tasks is configured to be one. >> >>> If i choose to decide number of reducers to mention explicitly, what should >>> I consider.Because choosing in appropriate number of reducer hampers the >>> performance. >> >> See http://wiki.apache.org/hadoop/HowManyMapsAndReduces >> >> -- >> Harsh J -- Harsh J