Thomas Graves created MAPREDUCE-4213:
----------------------------------------
Summary: nodemanager should cleanup running containers when
shutdown
Key: MAPREDUCE-4213
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4213
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2, nodemanager
Affects Versions: 0.23.3
Reporter: Thomas Graves
Currently the nodemanager doesn't cleanup running containers when it gets
restarted. This can cause containers to get lost and stick around forever.
We've seen this happen multiple times when the RM is restarted. When the RM is
brought back up, it doesn't know about what was running on the cluster, it
tells the NMs to reboot and when the NM reboots it loses what it had running.
If there are any containers that are behaving badly there is no one left that
knows about them to kill them.
We should try to kill any running containers when the node manager is shutting
down. We should also check when the nodemanager is being brought back up - but
that will be a separate jira.
This might change a bit when RM restart is implemented if tasks can actually
survive across RM/NM being rebooted, but that can be addressed at that point.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira