[ 
https://issues.apache.org/jira/browse/AURORA-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621027#comment-14621027
 ] 

Zameer Manji commented on AURORA-1388:
--------------------------------------

The containerizer should have killed those processes. I suggest filing a MESOS 
ticket about that behaviour, orphaned processes/containers is a pretty serious 
issue.

As for why the lifecycle protocol is not followed it seems that the slave kills 
the executor after attempting to shut it down. Since the slave is forcefully 
killing the executor it isn't possible for the lifecycle protocol to be 
completed. I am not sure why the slave ends up killing the executor instead of 
waiting for it to cleanly shut down.

> If mesos_slave gets a SIGUSR1, thermos doesn't shutdown cleanly
> ---------------------------------------------------------------
>
>                 Key: AURORA-1388
>                 URL: https://issues.apache.org/jira/browse/AURORA-1388
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Brian Brazil
>
> https://issues.apache.org/jira/browse/MESOS-1475 allows for a SIGUSR1 to be 
> sent to a mesos slave in order to shut it down and any processes cleanly, 
> useful for changing slave attributes.
> I tried this with my aurora setup, and via tcpdump found that it sent the 
> first {{/shutdown}} http request to the task - but nothing after it. The 
> process also kept on running, holding onto a static port in my case that 
> prevented things from working when a task is scheduled on that slave when it 
> comes back up.
> We should ensure that thermos behaves correctly when the mesos slave gets a 
> SIGUSR1, following the lifecycle policy and ultimately killing the processes 
> if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to