By the way, I had created https://issues.apache.org/jira/browse/HADOOP-2568 sometime back. The proposal is basically to have one shuffle task per job per node and assign reduces with consecutive taskIDs to a particular node. The shuffle task would fetch multiple consecutive outputs in one go from any map task node. This will reduce the number of seeks into the map output files by a factor #maps * #consecutive-reduces for any mapnode-reducenode pair, and should generally improve the usage of system resources (for e.g., fewer number of socket connections for transferring files, and, improved disk usage).
> -----Original Message----- > From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 10, 2008 11:16 PM > To: hadoop-user@lucene.apache.org > Subject: is a monolithic reduce task the right model? > > in thinking about Aaron's use case and our own problems with > fair sharing of hadoop cluster, one of the things that was > obvious was that reduces are a stumbling block for fair > sharing. It's easy to imagine a fair scheduling algorithm > doing good job of scheduling small map tasks. but the reduces > are a problem. they are too big and once scheduled last forever. > > another obvious thing is that reduce failures are expensive. > all the map outputs need to be refetched and merged again. > whereas, in many cases, the failure is in the reduction > logic. tying two and two together: > > - what if current reduce tasks were broken into separate > copy, sort and reduce tasks? > > we would get much smaller units of recovery and scheduling. > > thoughts? > > Joydeep >