[ 
https://issues.apache.org/jira/browse/MESOS-3352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743635#comment-14743635
 ] 

Joris Van Remoortere commented on MESOS-3352:
---------------------------------------------

In order to avoid the migration of cgroup pids by Systemd we can use the 
{{delegate=true}} flag. This guards Systemd from migrating the pids that are 
descendants of the process launched by a Systemd unit.

In order for this strategy to work, the {{delegate}} flag must be supported by 
the Systemd version. Support for this was introduced in Systemd v218; however, 
it has also been backported to v208 for RHEL7 and CentOS7 
[here|http://centoserrata.nagater.net/item/CEBA-2015-0037-CentOS-7.i386.x86_64.html]
 with the package 
[systemd-208-20|https://rhn.redhat.com/errata/RHBA-2015-1155.html]. It is 
highly recommended to upgrade to this package if running those operating 
systems.

Once the {{delegate=true}} flag has been set, the cgroups that are manually 
manipulated by the agent will no longer be migrated *during the lifetime of the 
agent*.

This still leaves the problem of tasks being migrated _after the agent has 
stopped running_ (voluntarily or not). In order to deal with the problem we 
propose the following solution:

If an agent is running on a Systemd initialized machine, then the agent will 
create a Systemd slice with a life-time that is independent of the agent and 
{{delegate=true}}. The linux launcher (used when cgroups isolators are enabled) 
will then assign the cgroup name for any executor that is launched to this 
separate slice. The consequence of this is that when the agent unit is 
terminated, the separate slice will continue to delegate the cgroups preventing 
Systemd from migrating the pids. A side benefit of this is that we can maintain 
the {{KillMode=cgroup}} flag on the agent and terminate all agent specific 
services such as the {{fetcher}} without terminating the tasks. This provides 
for a nice clean-up.

This solution will still require that the agent unit be launched with the 
{{delegate=true}} flag such that there is no race during the transition of the 
pids from the agent to the separate slice.

The agent will be responsible for verifying the slice is still available upon 
recovery, and warning the operator if it notices that the tasks it is 
recovering are no longer associated with this separate slice, as this can cause 
*silent* loss of isolation of existing tasks.

> Problem Statement Summary for Systemd Cgroup Launcher
> -----------------------------------------------------
>
>                 Key: MESOS-3352
>                 URL: https://issues.apache.org/jira/browse/MESOS-3352
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Joris Van Remoortere
>            Assignee: Joris Van Remoortere
>              Labels: design, mesosphere, systemd
>
> There have been many reports of cgroups related issues when running Mesos on 
> Systemd.
> Many of these issues are rooted in the manual manipulation of the cgroups 
> filesystem by Mesos.
> This task is to describe the problem in a 1-page summary, and elaborate on 
> the suggested 2 part solution:
> 1. Using the {{delegate=true}} flag for the slave
> 2. Implementing a Systemd launcher to run executors with tighter Systemd 
> integration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to