[
https://issues.apache.org/jira/browse/MESOS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jie Yu reassigned MESOS-7858:
-----------------------------
Assignee: Jie Yu
> Launching a nested container with namespace/pid isolation, with glibc < 2.25,
> may deadlock the LinuxLauncher and MesosContainerizer
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: MESOS-7858
> URL: https://issues.apache.org/jira/browse/MESOS-7858
> Project: Mesos
> Issue Type: Bug
> Components: containerization
> Affects Versions: 1.2.1, 1.3.0
> Reporter: Joseph Wu
> Assignee: Jie Yu
> Labels: health-check, mesosphere
>
> This bug in glibc (fixed in glibc 2.25) will sometimes cause a child process
> of a {{fork}} to {{assert}} incorrectly, if the parent enters a new pid
> namespace before forking:
> https://sourceware.org/bugzilla/show_bug.cgi?id=15392
> https://sourceware.org/bugzilla/show_bug.cgi?id=21386
> The LinuxLauncher code happens to do this when launching nested containers:
> * The MesosContainerizer process launches a subprocess, with a customized
> {{ns::clone}} function as an argument. The thread then basically waits for
> the launch to succeed and return a child PID:
> https://github.com/apache/mesos/blob/1.3.x/src/slave/containerizer/mesos/linux_launcher.cpp#L495
> * A separate thread in the Mesos agent forks and then waits for the
> grandchild to report a PID:
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L453
> * The child of the fork first enters the namespaces (including a pid
> namespace) and then forks a grandchild. The child then calls {{waitpid}} on
> the grandchild:
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L555
> * Due to the glibc bug, the grandchild sometimes never returns from the
> {{fork}} here:
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L540
> According to the glibc bug, we can work around this by:
> {quote}
> The obvious solution is just to use clone() after setns() and never use
> fork() - and one can certainly patch both programs to do so. Nevertheless it
> would be nice to see if fork() also worked after setns(), especially since
> there is no inherent reason for it not to.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)