[
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285401#comment-16285401
]
Steve Loughran commented on HADOOP-15107:
-----------------------------------------
Specifically, the failure mode to worry about is
# task attempt 1 is instructed to commit its output
# task attempt 1 does so (loads the .pending files, saves a single .pendingset
file). As Job commit only loads .pendingset files, it only finds lists of
output of committed tasks.
# task attempt 1 fails before reporting its success to the job manager
# job manager creates task attempt 2, which it commits, and also generates a
.pendingset file
# job commit loads all .pendingset files under the task attempts
# therefore it will load those of both tasks, and commit them.
# and, as things are done in parallel, there's a risk that the final output
contains either the output of both attempts, or, if they have the same
filenames, a mix of both.
Proposed solution
# task commit to save the pendingset file in a destination dir of the job
attempt, with a filename $task.pendingset.
# if a second task attempt is executed, then it will save to the same file, so
overwrite the list of the first set.
# which will not be committed (and will need list+abort)
> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
> Key: HADOOP-15107
> URL: https://issues.apache.org/jira/browse/HADOOP-15107
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.1.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset
> lists from committed tasks to the final destination, where they are read and
> committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]