[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169004#comment-13169004
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3530:
----------------------------------------------------

bq. The RM should probably exit if the scheduler thread sees exceptions, 
instead of the RM continuing to run without the scheduler thread.
Let's do that separately. We need this kind of checking for all components.
                
> Sometimes NODE_UPDATE to the scheduler throws an NPE causing the scheduling 
> to stop
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3530
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3530
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 0.23.1
>            Reporter: Karam Singh
>            Assignee: Arun C Murthy
>            Priority: Blocker
>         Attachments: MAPREDUCE-3530.patch
>
>
> Sometimes NODE_UPDATE to the scheduler throws NPE causes scheduling to stop 
> but ResourceManager keeps on running.
> I have been observing intermitently for last 3 weeks.
> But with latest svn code. I tried to run sort twice and both times Job got 
> stuck due to NPE.
> {code}
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerLaunchedOnNode(SchedulerApp.java:181)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.containerLaunchedOnNode(CapacityScheduler.java:596)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:539)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:617)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:77)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:294)
>         at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to