[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265998#comment-13265998
 ] 

Thomas Graves commented on MAPREDUCE-4213:
------------------------------------------

It would terminate containers on startup in cases where NM didn't shut down 
gracefully or somehow missed something on shutdown - hardware issues, NM 
crashes, etc.  


                
> nodemanager should cleanup running containers when shutdown
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-4213
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4213
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> Currently the nodemanager doesn't cleanup running containers when it gets 
> restarted.  This can cause containers to get lost and stick around forever.  
> We've seen this happen multiple times when the RM is restarted. When the RM 
> is brought back up, it doesn't know about what was running on the cluster, it 
> tells the NMs to reboot and when the NM reboots it loses what it had running. 
> If there are any containers that are behaving badly there is no one left that 
> knows about them to kill them. 
> We should try to kill any running containers when the node manager is 
> shutting down.  We should also check when the nodemanager is being brought 
> back up - but that will be a separate jira.  
> This might change a bit when RM restart is implemented if tasks can actually 
> survive across RM/NM being rebooted, but that can be addressed at that point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to