[ 
https://issues.apache.org/jira/browse/MESOS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-7858:
-----------------------------

    Assignee: Jie Yu

> Launching a nested container with namespace/pid isolation, with glibc < 2.25, 
> may deadlock the LinuxLauncher and MesosContainerizer
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-7858
>                 URL: https://issues.apache.org/jira/browse/MESOS-7858
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.2.1, 1.3.0
>            Reporter: Joseph Wu
>            Assignee: Jie Yu
>              Labels: health-check, mesosphere
>
> This bug in glibc (fixed in glibc 2.25) will sometimes cause a child process 
> of a {{fork}} to {{assert}} incorrectly, if the parent enters a new pid 
> namespace before forking: 
> https://sourceware.org/bugzilla/show_bug.cgi?id=15392
> https://sourceware.org/bugzilla/show_bug.cgi?id=21386
> The LinuxLauncher code happens to do this when launching nested containers:
> * The MesosContainerizer process launches a subprocess, with a customized 
> {{ns::clone}} function as an argument.  The thread then basically waits for 
> the launch to succeed and return a child PID: 
> https://github.com/apache/mesos/blob/1.3.x/src/slave/containerizer/mesos/linux_launcher.cpp#L495
> * A separate thread in the Mesos agent forks and then waits for the 
> grandchild to report a PID: 
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L453
> * The child of the fork first enters the namespaces (including a pid 
> namespace) and then forks a grandchild.  The child then calls {{waitpid}} on 
> the grandchild: 
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L555
> * Due to the glibc bug, the grandchild sometimes never returns from the 
> {{fork}} here: 
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L540
> According to the glibc bug, we can work around this by:
> {quote}
> The obvious solution is just to use clone() after setns() and never use 
> fork() - and one can certainly patch both programs to do so. Nevertheless it 
> would be nice to see if fork() also worked after setns(), especially since 
> there is no inherent reason for it not to.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to