[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462185#comment-13462185
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4672:
----------------------------------------------------

Chris, yes I understand that your NM process is down. But when NM goes down, it 
doesn't kill its containers as of now. So I am sure your AM container process 
is still running (from the call trace 
ApplicationMasterService.java:allocate(247) ).

You have to kill this AM process either manually or by handling the value of 
AMResponse.getReboot() in code.

If this isn't your current job, it should be some stale AM from before.
                
> RM with lost NMs results in massive log of AppAttemptId doesnt exist in cache
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4672
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4672
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 0.23.1
>            Reporter: Chris Riccomini
>            Assignee: Vinod Kumar Vavilapalli
>
> Hey Guys,
> I'm running a 9 node cluster with 8 NMs and a single RM node. If I run an app 
> master and have that app master start a container, then shut down all NMs, 
> but leave the RM up (to simulate a failure), the containers timeout and fail, 
> as expected.
> What's unexpected is that my log then starts filling with:
> 2012-09-21 18:02:02,614 ERROR resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist in 
> cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:03,617 ERROR resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist in 
> cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:04,618 ERROR resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist in 
> cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:05,620 ERROR resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist in 
> cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:06,621 ERROR resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist in 
> cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:07,623 ERROR resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist in 
> cache appattempt_1348248013002_0001_000001
> 2012-09-21 18:02:08,624 ERROR resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist in 
> cache appattempt_1348248013002_0001_000001
> Is there any way to shut this off/fix it? It just keeps going forever, until 
> I bounce the RM node.
> Thanks!
> Chris

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to