[
https://issues.apache.org/jira/browse/AURORA-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620884#comment-14620884
]
Zameer Manji commented on AURORA-1388:
--------------------------------------
Maybe I'm not understanding the reviews associated with MESOS-1475 but I don't
think there is anything to do with thermos. First if the process kept running
after SIGUSR1 was sent to the slave then mesos did not actually shut down all
tasks and clean up all cgroups. That sounds like a mesos bug. Secondly, I don't
see the slave sending a killTask message to the executors when SIGUSR1 is
received. If no killTask is sent the executor but the slave moves to terminate
the executor then it is not possible for the lifecycle policy to be followed.
> If mesos_slave gets a SIGUSR1, thermos doesn't shutdown cleanly
> ---------------------------------------------------------------
>
> Key: AURORA-1388
> URL: https://issues.apache.org/jira/browse/AURORA-1388
> Project: Aurora
> Issue Type: Bug
> Reporter: Brian Brazil
>
> https://issues.apache.org/jira/browse/MESOS-1475 allows for a SIGUSR1 to be
> sent to a mesos slave in order to shut it down and any processes cleanly,
> useful for changing slave attributes.
> I tried this with my aurora setup, and via tcpdump found that it sent the
> first {{/shutdown}} http request to the task - but nothing after it. The
> process also kept on running, holding onto a static port in my case that
> prevented things from working when a task is scheduled on that slave when it
> comes back up.
> We should ensure that thermos behaves correctly when the mesos slave gets a
> SIGUSR1, following the lifecycle policy and ultimately killing the processes
> if needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)