[
https://issues.apache.org/jira/browse/MAPREDUCE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265998#comment-13265998
]
Thomas Graves commented on MAPREDUCE-4213:
------------------------------------------
It would terminate containers on startup in cases where NM didn't shut down
gracefully or somehow missed something on shutdown - hardware issues, NM
crashes, etc.
> nodemanager should cleanup running containers when shutdown
> -----------------------------------------------------------
>
> Key: MAPREDUCE-4213
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4213
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, nodemanager
> Affects Versions: 0.23.3
> Reporter: Thomas Graves
>
> Currently the nodemanager doesn't cleanup running containers when it gets
> restarted. This can cause containers to get lost and stick around forever.
> We've seen this happen multiple times when the RM is restarted. When the RM
> is brought back up, it doesn't know about what was running on the cluster, it
> tells the NMs to reboot and when the NM reboots it loses what it had running.
> If there are any containers that are behaving badly there is no one left that
> knows about them to kill them.
> We should try to kill any running containers when the node manager is
> shutting down. We should also check when the nodemanager is being brought
> back up - but that will be a separate jira.
> This might change a bit when RM restart is implemented if tasks can actually
> survive across RM/NM being rebooted, but that can be addressed at that point.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira