[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143919#comment-13143919
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3347:
----------------------------------------------------

Ramgopal, do you have retries enabled? Bump yarn.resourcemanager.am.max-retries 
to say 3 or 4 and retry with a fresh cluster. The default value is 1, so the 
retry is off by default.
                
> Resource manager is not respawning MRAppMaster process if it goes down in the 
> middle of job execution and the job is getting failed.
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3347
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3347
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ramgopal N
>
> ApplicationMaster service should recover the job if MRAppMaster process goes 
> down in the middle of job execution.If not MRAppMaster process becomes the 
> single point of failure for the job and losses the advantage of MRV1 
> framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to