Joydeep Sen Sarma wrote:
- what if current reduce tasks were broken into separate copy, sort and reduce 
tasks?

we would get much smaller units of recovery and scheduling.

thoughts?

If copy, sort and reduce are not scheduled together then it would be very hard to ensure they run on the same node, and if they do not all run on the same node then we'd have to move their data around, which would substantially affect throughput, not to mention adding another copy phase...

Please see https://issues.apache.org/jira/browse/HADOOP-2573 for another proposed solution to this.

Doug

Reply via email to