[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3186:
-----------------------------------------------

    Priority: Major  (was: Blocker)

bq. If the resource manager is restarted while the job execution is in 
progress, the job is getting hanged. UI shows the job as running.
This must be the AM UI. RM completely forgets about all the applications today.

bq. It seems that the "right" thing for this communication mechanism between 
the RM and the AM to recognize that the AM is no longer valid and throw the 
appropriate exception so that the AM can exit cleanly.
Yes, till we support RM-restart, this should alleviate the issues. +1.

I believe this is not a blocker as we mostly are not going to have RM-restart 
in 0.23.
                
> User jobs are getting hanged if the Resource manager process goes down and 
> comes up while job is getting executed.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>         Environment: linux
>            Reporter: Ramgopal N
>            Assignee: Eric Payne
>              Labels: test
>
> If the resource manager is restarted while the job execution is in progress, 
> the job is getting hanged.
> UI shows the job as running.
> In the RM log, it is throwing an error "ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> AppAttemptId doesnt exist in cache appattempt_1318579738195_0004_000001"
> In the console MRAppMaster and Runjar processes are not getting killed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to