[
https://issues.apache.org/jira/browse/MESOS-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623125#comment-14623125
]
Benjamin Mahler commented on MESOS-3028:
----------------------------------------
I'm not too familiar with Thermos, but it looks as though it uses
[setsid(2)|http://man7.org/linux/man-pages/man2/setsid.2.html] \[1\]. This
makes it problematic when not using cgroups isolation to jail the processes.
Since you're not providing an {{--isolation}} flag, we default to using a POSIX
compatible process launcher, which uses a best-effort
[killtree|https://github.com/apache/mesos/blob/3073bd4e6fc119875fef22b364872056ef97efd3/3rdparty/libprocess/3rdparty/stout/include/stout/os/killtree.hpp#L58].
[~idownes] [~jieyu] I found it surprising that on Linux, we default to a
PosixLauncher rather than a LinuxLauncher \[2\]. Any reason for this? Or is
this a bug?
\[1\]
https://github.com/apache/aurora/blob/827b9abea48babe53ad5b2c521757c60f04c6dfc/src/main/python/apache/thermos/core/process.py#L327.
\[2\]
https://github.com/apache/mesos/blob/3073bd4e6fc119875fef22b364872056ef97efd3/src/slave/containerizer/mesos/containerizer.cpp#L149
> If mesos_slave gets a SIGUSR1, frameworks aren't completely shutdown
> --------------------------------------------------------------------
>
> Key: MESOS-3028
> URL: https://issues.apache.org/jira/browse/MESOS-3028
> Project: Mesos
> Issue Type: Bug
> Components: framework, slave
> Reporter: Brian Brazil
>
> See AURORA-1388 for full details.
> I sent a SIGUSR1 to a mesos_slave and the executor running on it a little bit
> of time to do things, however it then appears that the executor was killed -
> but not any of the children.
> This is a problem as it means executors don't have enough time to shutdown
> gracefully when a mesos_slave is being drained for maintenance, and that
> processes are left lying around using untracked resources.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)