[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982375#comment-14982375 ]
Junping Du commented on MAPREDUCE-5485: --------------------------------------- Thanks [~bikassaha] for the comments! I agree it makes more sense to move retry logic into committer.commitJob() if it support repeatable. My original thinking is to combine this retry for committer.commitJob() with other AM exceptions in handleJobCommit (outside of committer), like: failed to write endCommitSuccessFile, etc. But now I think we should separate committer retry with AM specific handling for the reason you mentioned above. For this case, I would prefer we just let AM exit directly instead of fail the job (if commit job is repeatable). Most like the same as proposed above by [~nemon], but a slightly different is: we should apply AM fail (not job fail) even for commiter.commitJob() failed after retry for handling some corner cases, i.e. something goes wrong with related to committer in this AM but still get chance to success in another AM if we support repeatable in commit job. I will update a patch soon. > Allow repeating job commit by extending OutputCommitter API > ----------------------------------------------------------- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 2.1.0-beta > Reporter: Nemon Lou > Assignee: Junping Du > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)