I found following section at the end of chapter 6 of the book <Hadoop, the
definitive guide>,  
--------------------
'Task side-effect files';
"Care needs to be taken to ensure that multiple instances of the same task
don't try to
write to the same file. There are two problems to avoid: if a task failed
and was retried,
then the old partial output would still be present when the second task ran,
and it would
have to delete the old file first. Second, with speculative execution
enabled, two instances
of the same task could try to write to the same file simultaneously." 
-----------------------
In the description: "two instances of the same task could try to write to
the same file simultaneously" is a case should be avoided.
Can anyone confirm this for me, and if possible, tell me the reason below
behind it. 
Thanks.

Steven. Wu




Reply via email to