Pedro,

Simply put: Speculative execution.

When a task enters that state, it means that it has completed the M/R 
execution, and its awaiting the tracker to commit it so that it can run the 
OutputCommitter process and finalize the outputs (outputs lie in temporary 
directories until committed, if you check with FileOutputCommitter, the default 
OutputCommitter in Hadoop MR).

This is to avoid conflicting outputs when you have speculatives turned on. Two 
tasks can complete at the same time and you do not want both to be committed. 
So the TT will commit the first one that reports back, and kill away the other 
COMMIT_PENDING waiting one in this case.

You might notice (3) cause speculative execution does affect the tail of a job 
run.

On 15-Nov-2011, at 10:35 PM, Pedro Costa wrote:

> Hi,
> 
> Hadoop MR tasks can have the state COMMIT_PENDING.
> 
> 1- What's the purpose of that state?
> 2- What's the reason for a task being in this state?
> 3- It's only the last task before finishing a job that enters this state?
> 
> -- 
> Thanks,

Reply via email to