[
https://issues.apache.org/jira/browse/MESOS-3352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743635#comment-14743635
]
Joris Van Remoortere commented on MESOS-3352:
---------------------------------------------
In order to avoid the migration of cgroup pids by Systemd we can use the
{{delegate=true}} flag. This guards Systemd from migrating the pids that are
descendants of the process launched by a Systemd unit.
In order for this strategy to work, the {{delegate}} flag must be supported by
the Systemd version. Support for this was introduced in Systemd v218; however,
it has also been backported to v208 for RHEL7 and CentOS7
[here|http://centoserrata.nagater.net/item/CEBA-2015-0037-CentOS-7.i386.x86_64.html]
with the package
[systemd-208-20|https://rhn.redhat.com/errata/RHBA-2015-1155.html]. It is
highly recommended to upgrade to this package if running those operating
systems.
Once the {{delegate=true}} flag has been set, the cgroups that are manually
manipulated by the agent will no longer be migrated *during the lifetime of the
agent*.
This still leaves the problem of tasks being migrated _after the agent has
stopped running_ (voluntarily or not). In order to deal with the problem we
propose the following solution:
If an agent is running on a Systemd initialized machine, then the agent will
create a Systemd slice with a life-time that is independent of the agent and
{{delegate=true}}. The linux launcher (used when cgroups isolators are enabled)
will then assign the cgroup name for any executor that is launched to this
separate slice. The consequence of this is that when the agent unit is
terminated, the separate slice will continue to delegate the cgroups preventing
Systemd from migrating the pids. A side benefit of this is that we can maintain
the {{KillMode=cgroup}} flag on the agent and terminate all agent specific
services such as the {{fetcher}} without terminating the tasks. This provides
for a nice clean-up.
This solution will still require that the agent unit be launched with the
{{delegate=true}} flag such that there is no race during the transition of the
pids from the agent to the separate slice.
The agent will be responsible for verifying the slice is still available upon
recovery, and warning the operator if it notices that the tasks it is
recovering are no longer associated with this separate slice, as this can cause
*silent* loss of isolation of existing tasks.
> Problem Statement Summary for Systemd Cgroup Launcher
> -----------------------------------------------------
>
> Key: MESOS-3352
> URL: https://issues.apache.org/jira/browse/MESOS-3352
> Project: Mesos
> Issue Type: Task
> Reporter: Joris Van Remoortere
> Assignee: Joris Van Remoortere
> Labels: design, mesosphere, systemd
>
> There have been many reports of cgroups related issues when running Mesos on
> Systemd.
> Many of these issues are rooted in the manual manipulation of the cgroups
> filesystem by Mesos.
> This task is to describe the problem in a 1-page summary, and elaborate on
> the suggested 2 part solution:
> 1. Using the {{delegate=true}} flag for the slave
> 2. Implementing a Systemd launcher to run executors with tighter Systemd
> integration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)