[jira] [Commented] (MAPREDUCE-3347) Resource manager is not respawning MRAppMaster process if it goes down in the middle of job execution and the job is getting failed.

Vinod Kumar Vavilapalli (Commented) (JIRA) Fri, 04 Nov 2011 04:15:28 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143919#comment-13143919
 ]


Vinod Kumar Vavilapalli commented on MAPREDUCE-3347:
----------------------------------------------------

Ramgopal, do you have retries enabled? Bump yarn.resourcemanager.am.max-retries 
to say 3 or 4 and retry with a fresh cluster. The default value is 1, so the 
retry is off by default.
                
> Resource manager is not respawning MRAppMaster process if it goes down in the 
> middle of job execution and the job is getting failed.
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3347
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3347
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ramgopal N
>
> ApplicationMaster service should recover the job if MRAppMaster process goes 
> down in the middle of job execution.If not MRAppMaster process becomes the 
> single point of failure for the job and losses the advantage of MRV1 
> framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3347) Resource manager is not respawning MRAppMaster process if it goes down in the middle of job execution and the job is getting failed.

Reply via email to