[
https://issues.apache.org/jira/browse/MAPREDUCE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916309#comment-13916309
]
Robert Joseph Evans commented on MAPREDUCE-5547:
------------------------------------------------
The question comes down to where is the single source of truth. When we wrote
this code to avoid any split brain problems we decided that the single source
of truth should be the state of the app stored in HDFS. Primarily because we
did not want to change any of the RM APIs to allow for extra application state
where the job succeeded, but the application is not done yet. We did this
because the MR client APIs only used used the RM to determine if they should
talk to the AM or the History server, and we assumed that everyone would use
the MR APIs, or the _SUCCESS file. And as Jason has pointed out it keeps most
cleanup operations happening in a state where they can be retried on an error.
If you feel that you need to switch the source of truth to the RM, that is
fine, but we still need a way to keep that state. Perhaps let the Application
be in a "non-critical cleanup" state. The application has succeeded or failed,
and that information is stored in the RM, but the application is still
performing cleanup operations. This would allow the app to be relaunched, if
needed, but if it crashes and runs out of attempts, the RM can give up, and
still store that the app succeeded, but the with some errors during cleanup.
> Job history should not be flushed to JHS until AM gets unregistered
> -------------------------------------------------------------------
>
> Key: MAPREDUCE-5547
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5547
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Reporter: Zhijie Shen
> Assignee: Zhijie Shen
> Attachments: MAPREDUCE-5547.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)