[ 
https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285401#comment-16285401
 ] 

Steve Loughran commented on HADOOP-15107:
-----------------------------------------

 Specifically, the failure mode to worry about is

# task attempt 1 is instructed to commit its output
# task attempt 1 does so (loads the .pending files, saves a single .pendingset 
file). As Job commit only loads .pendingset files, it only finds lists of 
output of committed tasks.
# task attempt 1 fails before reporting its success to the job manager
# job manager creates task attempt 2, which it commits, and also generates a 
.pendingset file
# job commit loads all .pendingset files under the task attempts
# therefore it will load those of both tasks, and commit them.
# and, as things are done in parallel, there's a risk that the final output 
contains either the output of both attempts, or, if they have the same 
filenames, a mix of both.

Proposed solution
# task commit to save the pendingset file in a destination dir of the job 
attempt, with a filename $task.pendingset. 
# if a second task attempt is executed, then it will save to the same file, so 
overwrite the list of the first set. 
# which will not be committed (and will need list+abort)



> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15107
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper 
> paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the 
> FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is 
> implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset 
> lists from committed tasks to the final destination, where they are read and 
> committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to