Bilwa S T created MAPREDUCE-7314:
------------------------------------
Summary: Job will hang if NM is restarted while its running
Key: MAPREDUCE-7314
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Reporter: Bilwa S T
Assignee: Bilwa S T
This is due to three different reasons
# PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
# Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill
current attempt which is assigned to container. That is because task attempt is
not updated in ContainerLauncherImpl#Container class.
# Container gets assigned to task attempt even when container has stopped
running ie Container completed event is processed. This is because we add reuse
container map to allocated list. Makeremoterequest gets the same container in
allocationResponse whereas RM has sent same container in finished container
list. To avoid this we need to make sure allocated list doesnt have any
containers which are finished.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]