[jira] [Updated] (MAPREDUCE-3339) Job is getting hanged indefinitely,if the child processes are killed on the NM. KILL_CONTAINER eventtype is continuosly sent to the containers that are not existing

Siddharth Seth (Updated) (JIRA) Thu, 15 Dec 2011 15:36:01 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Siddharth Seth updated MAPREDUCE-3339:
--------------------------------------

    Attachment: MR3339_v2.txt

Modified the way the AM finds out knownNodes based on feedback from Hitesh and 
Vinod.

knownNodes are now reported by the RM on each allocate call.

One known issue with this approach. AM blacklisting is host based, the known 
node count is NodeManager based - so if there's multiple NMs on a node, 
disabling blacklisting may not work. AM blacklisting needs to move over to 
being NM based instead of node based.
                
> Job is getting hanged indefinitely,if the child processes are killed on the 
> NM.  KILL_CONTAINER eventtype is continuosly sent to the containers that are 
> not existing
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3339
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3339
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ramgopal N
>            Assignee: Siddharth Seth
>            Priority: Blocker
>         Attachments: MR3339_v1.txt, MR3339_v2.txt
>
>
> I have only one NM running.
> I have submitted a job and all the child processes on the NM got killed 
> continuosly.This made the Job to hang indefinitely.
> In the NM logs it is logging WARN message 
> :org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_1320301910500_0004_01_001359 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3339) Job is getting hanged indefinitely,if the child processes are killed on the NM. KILL_CONTAINER eventtype is continuosly sent to the containers that are not existing

Reply via email to