[
https://issues.apache.org/jira/browse/MESOS-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358625#comment-16358625
]
James Peach edited comment on MESOS-8313 at 10/15/18 6:38 PM:
--------------------------------------------------------------
Note, this supervisor need to reap all its children, as per MESOS-5893.
was (Author: jamespeach):
Note, this supervisor need to read all its children, as per MESOS-5893.
> Provide a host namespace container supervisor.
> ----------------------------------------------
>
> Key: MESOS-8313
> URL: https://issues.apache.org/jira/browse/MESOS-8313
> Project: Mesos
> Issue Type: Improvement
> Components: containerization
> Reporter: James Peach
> Assignee: James Peach
> Priority: Major
> Attachments: IMG_2629.JPG
>
>
> After more investigation on user namespaces, the current implementation of
> creating the container namespaces needs some adjustment before we can
> implement user namespaces in a useable fashion.
> The problems we need to address are:
> 1. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the PID namespace
> to mount {{procfs}}. Currently, this prevents containers joining the host PID
> namespace. The workaround is to always create a new container PID namespace
> (as a child of the user namespace) with the {{namespaces/pid}} isolator.
> 2. The containerizer needs to hold {{CAP_SYS_ADMIN}} over the network
> namespace to mount {{sysfs}}. There's no general workaround for this since we
> can't generally require containers to not join the host network namespace.
> 3. The containerizer can't enter a user namespace after entering the
> {{chroot}}. This restriction makes the existing order of containerizer
> operations impossible to remain in the case where we want the executor to be
> in a new user namespace that has no children (i.e. to protect the container
> from a privileged task).
> After some discussion with [~jieyu], we believe that we can some most or all
> of these issues by creating a new containerized supervisor that runs fully
> outside the container and is responsible for constructing the roots mount
> namespace, launching the containerized to enter the rest of the container,
> and waiting on the entered process.
> Since this new supervisor process is not running in the user namespace, it
> will be able to construct the container rootfs in a new mount namespace
> without user namespace restrictions. We can then clone a child to fully
> create and enter container namespaces along with the prefabricated rootfs
> mount namespace.
> The only drawback to this approach is that the container's mount namespace
> will be owned by the root user namespace rather than the container user
> namespace. We are OK with this for now.
> The plan here is to retain the existing {{mesos-containerizer launch}}
> subcommand and add a new {{mesos-containerizer supervise}} subcommand, which
> will be its parent process. This new subcommand will be used for the default
> executor and custom executor code paths.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)