[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000291#comment-15000291
 ] 

Bikas Saha commented on MAPREDUCE-5485:
---------------------------------------

bq. The cleanupInterruptedCommit() already check previous job commit succeed or 
failed. Am I missing anything here?
This introduces duplication of code for checking commit status and can cause a 
bug if the logic changes in either place. And also makes extra RPC calls to 
HDFS for checking file status - which is avoidable. Moving the code to the 
place where earlier we were failing due to in-progress commit, will allow this 
method to do exactly as it name suggests - cleanup in progress commit markers. 
Does that clarify?

Should we say previous AM failures to be precise?
{code}+   * If repeatable job commit is supported, job restart can tolerate 
previous
+   * failures during job commit.{code}

To be clear, we should look at adding 2 more tests. 
1) Test MR Appmaster new functionality that allows commit to proceed in a 
retried AM if commit is repeatable. 
2) Test in FileOutputCommitter that for repeatable commit - a 
filenotfoundexception is not counted as an error (new behavior).
Maybe the patch missed adding some new changed file? Sorry if I missed 
something and the tests already exist.


> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, 
> MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to