Joydeep Sen Sarma wrote:
- what if current reduce tasks were broken into separate copy, sort and reduce
tasks?
we would get much smaller units of recovery and scheduling.
thoughts?
If copy, sort and reduce are not scheduled together then it would be
very hard to ensure they run on the same node, and if they do not all
run on the same node then we'd have to move their data around, which
would substantially affect throughput, not to mention adding another
copy phase...
Please see https://issues.apache.org/jira/browse/HADOOP-2573 for another
proposed solution to this.
Doug