[ 
https://issues.apache.org/jira/browse/HADOOP-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630524#comment-16630524
 ] 

Gera Shegalov commented on HADOOP-15782:
----------------------------------------

Thanks for the comments [[email protected]]. I was looking at committers.md in 
trunk.  I mostly agree. v2 was designed with atomic rename semantics in mind, 
and any FileSystem implementation that does not provide it will be vulnerable 
to inconsistencies. However, the section on hadoop committers is called 
"Background" which is the good old atomic rename world on HDFS executed by 
MapReduce . 
{quote}a rerun task may generate files with different names. If this holds, 
those files from the first attempt which are copied into place will still be 
there. Outcome: output of two attempts may be in the destination.
{quote}
Correct, but this is a user error to have any kind non-determinism between task 
attempts. I always remind my coworkers to set seeds for Random based on a hash 
of something repeatable like task id excluding attempt Id  and not use anything 
like the default currentTimeMillis. Let alone generate files with different 
names. Rereading this a few times I realize that you might also include the 
case of multi-file outputs which makes even the task commit non-atomic.  
{quote}if the created filenames are the same, if the first attempt hasn't 
actually failed, but instead paused for some time, but then resumes (GC pauses, 
VM hangs etc), then the first attempt will continue its rename, and potentially 
then overwriting 1+ file of the previous attempts output. Outcome: the data may 
be a mix of two attempts.
{quote}
Only one attempt is allowed to commit via canCommit and with deterministic 
output + atomic rename works anyway.
{quote}Right now I don't trust v2. it's worse on object stores as 
time-to-rename is potentially much longer, so probability of task failure 
during rename is higher.
{quote}
I understand, but for previously typical cases (HDFS, single output file per 
attempt) it is robust. Once you realize that even on HDFS v1 was noticeably 
non-atomic for large jobs and you need to check for _SUCCESS or have another 
service recording completion, v2 was a big improvement for Twitter.

I am just about to familiarize myself with Spark's use of FOC 

> Clarify committers.md around v2 failure handling
> ------------------------------------------------
>
>                 Key: HADOOP-15782
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15782
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 3.1.0, 3.1.1
>            Reporter: Gera Shegalov
>            Priority: Major
>
> The doc file 
> {{hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md}} 
> refers to the default file output committer (v2) as not supporting job and 
> task recovery throughout the doc:
> {quote}or just by rerunning everything (The "v2" algorithm and Spark).
> {quote}
> This is incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to