[
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163803#comment-14163803
]
Ian Downes commented on MESOS-1871:
-----------------------------------
I looked at the code: os::killtree()'s behavior is incorrect.
1. The posix launcher puts the executor into it's own session with setsid.
2. The posix launcher calls os::killtree(pid, SIGKILL, true, true) where the
trues are for killing all processes in group and session.
3. os::killtree() *returns early* if it can't find the *process* with pid
(which is the scenario you're describing) so it doesn't actually continue to
kill everything in the process group/session.
I modified the code early this year and perpetuated the existing bug. I'll file
a ticket on this.
> Sending SIGTERM to a task command may render it orphaned
> --------------------------------------------------------
>
> Key: MESOS-1871
> URL: https://issues.apache.org/jira/browse/MESOS-1871
> Project: Mesos
> Issue Type: Bug
> Components: slave
> Reporter: Alexander Rukletsov
> Assignee: Alexander Rukletsov
>
> {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means
> signals are sent to the top process—that is {{sh -c}}—and not to the task
> directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process
> tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates
> reporting success to the {{CommandExecutor}}, rendering the task detached
> from the parent process and still running. Because the {{CommandExecutor}}
> thinks the command terminated normally, its OS process exits normally and may
> not trigger containerizer's escalation which destroys cgroups.
> Here is the test related to the first part:
> [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
> Here is the test related to the second part:
> [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)