[ https://issues.apache.org/jira/browse/MAPREDUCE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786513#comment-13786513 ]
Jason Lowe commented on MAPREDUCE-5562: --------------------------------------- bq. Since we are using RMProxy, connection exception are handled in RMProxy and retried automatically, and we can also define other type of exception in RMProxy with different retry policy if needed. For work-preserving restart, AM will hang when RM is down and after RM comes up, it should be able to unregister successfully. OK, so if we're having connection-level issues with the RM it sounds like we will get some retries at a lower level which is good. I don't want AMs to simply give up just because of an isolated, temporary network connectivity issue. So it sounds like we're left with the staging directory issue. Can't we cleanup the staging directory before leaving if it's the last attempt? > MR AM should exit when unregister() throws exception > ---------------------------------------------------- > > Key: MAPREDUCE-5562 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5562 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Zhijie Shen > Assignee: Zhijie Shen > Attachments: MAPREDUCE-5562.1.patch, MAPREDUCE-5562.2.patch, > MAPREDUCE-5562.3.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144)