[
https://issues.apache.org/jira/browse/MAPREDUCE-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddharth Seth updated MAPREDUCE-3339:
--------------------------------------
Attachment: MR3339_v2.txt
Modified the way the AM finds out knownNodes based on feedback from Hitesh and
Vinod.
knownNodes are now reported by the RM on each allocate call.
One known issue with this approach. AM blacklisting is host based, the known
node count is NodeManager based - so if there's multiple NMs on a node,
disabling blacklisting may not work. AM blacklisting needs to move over to
being NM based instead of node based.
> Job is getting hanged indefinitely,if the child processes are killed on the
> NM. KILL_CONTAINER eventtype is continuosly sent to the containers that are
> not existing
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3339
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3339
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.0
> Reporter: Ramgopal N
> Assignee: Siddharth Seth
> Priority: Blocker
> Attachments: MR3339_v1.txt, MR3339_v2.txt
>
>
> I have only one NM running.
> I have submitted a job and all the child processes on the NM got killed
> continuosly.This made the Job to hang indefinitely.
> In the NM logs it is logging WARN message
> :org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Event EventType: KILL_CONTAINER sent to absent container
> container_1320301910500_0004_01_001359
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira