[jira] [Commented] (MAPREDUCE-5476) Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM

Jian He (JIRA) Wed, 21 Aug 2013 15:02:34 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746899#comment-13746899
 ]


Jian He commented on MAPREDUCE-5476:
------------------------------------

Did manual test on single node cluster.
1. Put a long sleep before the unregister call of RMCommunicator.
2. Kill the AM while it's sleeping and restart the RM.

Without the patch, the following restarted AMs (up to max-num-am-retry) will 
fail since the staging dir has already been removed.
With the patch, the restarted AM is able to continue and succeed.
                
> Job can fail when RM restarts after staging dir is cleaned but before MR 
> successfully unregister with RM
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5476
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5476
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: MAPREDUCE-5476.patch, YARN-917.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5476) Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM

Reply via email to