[
https://issues.apache.org/jira/browse/MAPREDUCE-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144582#comment-13144582
]
Ramgopal N commented on MAPREDUCE-3347:
---------------------------------------
Hi vinod ,
By enabling yarn.resourcemanager.am.max-retries in yarn-site.xml the RM retries
specified number of times before failing the job. Thanks
> Resource manager is not respawning MRAppMaster process if it goes down in the
> middle of job execution and the job is getting failed.
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3347
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3347
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.0
> Reporter: Ramgopal N
>
> ApplicationMaster service should recover the job if MRAppMaster process goes
> down in the middle of job execution.If not MRAppMaster process becomes the
> single point of failure for the job and losses the advantage of MRV1
> framework.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira