[
https://issues.apache.org/jira/browse/MAPREDUCE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314937#comment-14314937
]
Craig Welch commented on MAPREDUCE-5547:
----------------------------------------
So, I think it will be very problematic to move the unregistration of the job
ahead of the upload of the job history logs - as far as I know the grace period
is just in the application master waiting and still accepting requests, the
resource manager immediately begins forwarding clients at unregistration, which
means that if we unregister and then upload the job history file we will
definitely have a time period where clients will be sent to the job history
server and will fail. Also, resource manager restarts are far less frequent
then job completion/check occurrences, we don't want to cause problems with the
latter to improve the situation with the former (I think [~zjshen] & [~jlowe]
made this point above, I concur...). I think the solution needs to be
something like a rollback - where a job can change it's state back to one which
causes the client to go to the am again, while clients looking directly at the
job history server may still get different results, client going through the rm
to get at state will again be directed to the am which has newly restarted. We
could provide a mechanism to allow the am to purge it's state from the
jobhistory server as well if this was a significant concern, to achieve full
"state correctness" for this case.
> Job history should not be flushed to JHS until AM gets unregistered
> -------------------------------------------------------------------
>
> Key: MAPREDUCE-5547
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5547
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Reporter: Zhijie Shen
> Assignee: Zhijie Shen
> Attachments: MAPREDUCE-5547.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)