[jira] [Commented] (MAPREDUCE-3031) Job Client goes into infinite loop when we kill AM

Vinod Kumar Vavilapalli (JIRA) Mon, 19 Sep 2011 06:31:36 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107813#comment-13107813
 ]


Vinod Kumar Vavilapalli commented on MAPREDUCE-3031:
----------------------------------------------------

This is a bug in NM and just about any container which is killed like 
this(doing a kill $pid on the node) will be stuck at RUNNING state on the RM. I 
found this on the corresponding NM:

{code}
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
CONTAINER_KILLED_ON_REQUEST at RUNNING
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:297)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:39)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:439)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:685)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:69)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:356)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:349)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:113)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
        at java.lang.Thread.run(Thread.java:619)
{code}

This is because an exit code of 137/143 is treated as a kill request. On hind 
sight it turns out this is a bad idea, we should fix this.

> Job Client goes into infinite loop when we kill AM
> --------------------------------------------------
>
>                 Key: MAPREDUCE-3031
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3031
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Karam Singh
>             Fix For: 0.23.0
>
>
> Started a cluster. Submitted a sleep job with around 10000 maps and 1000 
> reduces.
> Killed AM with kill -9 by which time already 7000 thousands maps got 
> completed.
> On the RM webUI, Application is stuck in Application.RUNNING state. And 
> JobClient goes into an infinite loop as RM keeps telling the client that the 
> application is running.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3031) Job Client goes into infinite loop when we kill AM

Reply via email to