[
https://issues.apache.org/jira/browse/MAPREDUCE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746899#comment-13746899
]
Jian He commented on MAPREDUCE-5476:
------------------------------------
Did manual test on single node cluster.
1. Put a long sleep before the unregister call of RMCommunicator.
2. Kill the AM while it's sleeping and restart the RM.
Without the patch, the following restarted AMs (up to max-num-am-retry) will
fail since the staging dir has already been removed.
With the patch, the restarted AM is able to continue and succeed.
> Job can fail when RM restarts after staging dir is cleaned but before MR
> successfully unregister with RM
> --------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5476
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5476
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Jian He
> Assignee: Jian He
> Attachments: MAPREDUCE-5476.patch, YARN-917.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira