Hi, "*mapred.job.reduce* "- number of reduce (map) tasks your job will has is depends on *mapred.tasktracker.reduce.tasks.maximum*( many reduce slot(s) you can have on each tasktracker, which decide number of total number reducer slots) property, recommendation for setting slightly fewer reducers than total slots becuase of tolertaes the few reduce failure without extending the job execution time. If allocate more or equal number of reduce to availabel slot, if any reduce task fails then job tracke has to wait to to resubmit this failed taks to some other node, because may all reduce slot utailized at that time. In this case job execution time will be extends to complete late resubmitted job.
On Mon, Apr 22, 2013 at 11:33 PM, Karthik Kambatla <[email protected]>wrote: > I wonder how accurate that is. > > However, by setting the number of reducers slightly lesser than the reduce > slots, the difference acts as headroom for speculative reduce tasks. And, > the goal of a single wave is also preserved. > > > On Mon, Apr 22, 2013 at 11:10 PM, Darpan R <[email protected]> wrote: > > > Hi guys, > > I read somewhere that for better performance > > > > For maximum performance, the number of reducers should be slightly less > > than > > the number of reduce slots in the cluster. This allows the reducers to > > finish in > > one wave and fully utilizes the cluster during the reduce phase. > > > > I don't quite understand this, Can you please help me understand? > > > > Thank you. > > > -- Regards, ..... Sudhakara.st
