Actually, all of my jobs tend to have one of these phases dominate the time.
It isn't always the same phase that dominates, though, so the consideration
isn't simple.

The fact (if it is a fact) that one phase or another dominates means,
however, that splitting them won't help much.


On 1/10/08 9:55 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> Joydeep Sen Sarma wrote:
>> - what if current reduce tasks were broken into separate copy, sort and
>> reduce tasks?
>> 
>> we would get much smaller units of recovery and scheduling.
>> 
>> thoughts?
> 
> If copy, sort and reduce are not scheduled together then it would be
> very hard to ensure they run on the same node, and if they do not all
> run on the same node then we'd have to move their data around, which
> would substantially affect throughput, not to mention adding another
> copy phase...
> 
> Please see https://issues.apache.org/jira/browse/HADOOP-2573 for another
> proposed solution to this.
> 
> Doug

Reply via email to