[
https://issues.apache.org/jira/browse/HADOOP-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630524#comment-16630524
]
Gera Shegalov commented on HADOOP-15782:
----------------------------------------
Thanks for the comments [[email protected]]. I was looking at committers.md in
trunk. I mostly agree. v2 was designed with atomic rename semantics in mind,
and any FileSystem implementation that does not provide it will be vulnerable
to inconsistencies. However, the section on hadoop committers is called
"Background" which is the good old atomic rename world on HDFS executed by
MapReduce .
{quote}a rerun task may generate files with different names. If this holds,
those files from the first attempt which are copied into place will still be
there. Outcome: output of two attempts may be in the destination.
{quote}
Correct, but this is a user error to have any kind non-determinism between task
attempts. I always remind my coworkers to set seeds for Random based on a hash
of something repeatable like task id excluding attempt Id and not use anything
like the default currentTimeMillis. Let alone generate files with different
names. Rereading this a few times I realize that you might also include the
case of multi-file outputs which makes even the task commit non-atomic.
{quote}if the created filenames are the same, if the first attempt hasn't
actually failed, but instead paused for some time, but then resumes (GC pauses,
VM hangs etc), then the first attempt will continue its rename, and potentially
then overwriting 1+ file of the previous attempts output. Outcome: the data may
be a mix of two attempts.
{quote}
Only one attempt is allowed to commit via canCommit and with deterministic
output + atomic rename works anyway.
{quote}Right now I don't trust v2. it's worse on object stores as
time-to-rename is potentially much longer, so probability of task failure
during rename is higher.
{quote}
I understand, but for previously typical cases (HDFS, single output file per
attempt) it is robust. Once you realize that even on HDFS v1 was noticeably
non-atomic for large jobs and you need to check for _SUCCESS or have another
service recording completion, v2 was a big improvement for Twitter.
I am just about to familiarize myself with Spark's use of FOC
> Clarify committers.md around v2 failure handling
> ------------------------------------------------
>
> Key: HADOOP-15782
> URL: https://issues.apache.org/jira/browse/HADOOP-15782
> Project: Hadoop Common
> Issue Type: Bug
> Components: documentation
> Affects Versions: 3.1.0, 3.1.1
> Reporter: Gera Shegalov
> Priority: Major
>
> The doc file
> {{hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md}}
> refers to the default file output committer (v2) as not supporting job and
> task recovery throughout the doc:
> {quote}or just by rerunning everything (The "v2" algorithm and Spark).
> {quote}
> This is incorrect.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]