[jira] [Commented] (MESOS-6543) Add special case for entering the "mount" namespace of a parent container

Kevin Klues (JIRA) Fri, 11 Nov 2016 19:06:40 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-6543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15658878#comment-15658878
 ]


Kevin Klues commented on MESOS-6543:
------------------------------------

{noformat}
commit b15b46ab00ebffa2b8247ac7c4210da538efcea5
Author: Kevin Klues <[email protected]>
Date:   Fri Nov 11 16:03:04 2016 -0800

    Added a default PATH environment variable when launching a container.

    Having an environment set but no PATH variable set inside of it,
    will trigger an `execvpe` call in launch.cpp on the command we are
    trying to launch. Normally this doesn't casue problems because
    `execvpe` with call `confstr(_CS_PATH)` under the hood to set up a
    default path to use when lookiing up our command.

    However, when provisioning a new filesystem for a container, its
    possible that the `confstr(_CS_PATH)` called from the agent may not
    return the same path as a `confstr(_CS_PATH)` call would return using
    the libc installed inside the container. This can lead to problems
    (for example) with finding the `sh` command in containers based on an
    alpine linux image.

    We observed this in our test setup with `confstr(_CS_PATH)` on the
    agent returning `/usr/bin`, while alpine linux only has the `sh`
    command installed in `/bin`.

    Review: https://reviews.apache.org/r/53585/
{noformat}
{noformat}
commit ed1b4bc62486e06a5e76922bcc2f9bd494ab01e8
Author: Kevin Klues <[email protected]>
Date:   Fri Nov 11 16:14:51 2016 -0800

    Added a flag to optionally skip the multithreaded check in setns().

    Review: https://reviews.apache.org/r/53681/
{noformat}
{noformat}
commit 757d2804d09da457103e67c843fdcebf52016097
Author: Kevin Klues <[email protected]>
Date:   Fri Nov 11 18:32:07 2016 -0800

    Added special case for entering "mnt" namespaces for DEBUG containers.

    Until we switch over to the default (a.k.a. "pod" executor) for
    launching command tasks, we need to special case which `pid` we use
    for entering the `mnt` namespace of a parent container.  Specifically,
    we need to enter the `mnt` namespace of the process representing the
    command task itself, not the `mnt` namespace of the `init` process of
    the container or the `executor` of the container because these run in
    the same `mnt` namespace as the agent (not the task).

    Unfortunately, there is no easy way to get the `pid` of tasks launched
    with the command executor because we only checkpoint the `pid` of the
    `init` process of these containers. For now, we compensate for this by
    simply walking the process tree from the container's `init` process up
    to 2-levels down (where the task process would exist) and look to see
    if any process along the way has a different `mnt` namespace. If it
    does, we return a reference to its `pid` as the `pid` for entering the
    `mnt` namespace of the container.  Otherwise, we return the `init`
    process's `pid`.

    We then pass this pid to the `mesos-containerizer launch` binary and
    have it set the namespace, rather than letting the `ns::clone()` call
    do it for us. This is important because otherwise we wouldn't be able
    to find the `mesos-containerizer launch` itself (it only exists in the
    host mount namespace!).

    Review: https://reviews.apache.org/r/53586/
{noformat}

> Add special case for entering the "mount" namespace of a parent container
> -------------------------------------------------------------------------
>
>                 Key: MESOS-6543
>                 URL: https://issues.apache.org/jira/browse/MESOS-6543
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Kevin Klues
>            Assignee: Kevin Klues
>              Labels: debugging, mesosphere
>             Fix For: 1.2.0
>
>
> Currently, tasks launched with the command executor have a hierarchy of 
> processes inside their container that looks as follows:
> {noformat}
> | - mesos-containerizer launch
> |   | - mesos-executor
> |   |   | - task process
> {noformat}
> However, the only pid from this hierarchy of processes that the agent is 
> aware of is the the pid for the top-level {{mesos-containerizer launch}} 
> binary.
> If all of these binaries were part of the same set of namespaces, then this 
> would be sufficient to discover the namespaces of the {{task process}} (we 
> could simply inspect the namespaces of the {{mesos-containerizer launch}} pid 
> and know they were the same for the {{task process}}.
> This is true for most of the namespaces that each of these processes exist 
> in. However, the {{mnt}} namespace of the two may differ. That is, the 
> {{mesos-containerizer launch}} binary is always in the same {{mnt}} namespace 
> as the host, while the {{task process}} binary may be in its own {{mnt}} 
> namespace if file system isolation is turned on and it has a new rootfs 
> provisioned for it (e.g. a docker image was provided for it).
> This has not been a problem until now because we never wanted to simply 
> _enter_ the {{mnt}} namespace of a container before. Even with nested 
> containers for pods, we always create a new {{mnt}} namespace branched off 
> the host {{mnt}} namespace (in order to support the injection of host-mounted 
> volumes).
> However, with the new debugging support we are adding, we need a way of 
> entering the {{mnt}} namespace of a parent container instead of cloning a new 
> one.
> Since we only have access to the {{pid}} of the container's init process, we 
> can simply enter all namespaces associated with that pid except the {{mnt}} 
> namespace. For the {{mnt}} namespace, we need to special case it to walk the 
> process hierarchy until we find the first process in a different {{mnt}} 
> namespace and enter that one instead. If none are found, simply enter the 
> {{mnt}} namespace of the "init" process.
> This is a dirty dirty hack, but should be sufficient for now.
> Eventually we want to completely eliminate the command executor in favor of 
> the "pod" (i.e. "default") executor, which doesn't have this problem at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6543) Add special case for entering the "mount" namespace of a parent container

Reply via email to