Hey, Recently I have experienced a number of issues in a production environment with the DockerContainerizer, Aurora and Thermos. Although my experience is specific to Docker, I believe this applies to anyone using the Mesos Containerizer with pid isolation. The root cause of these issues originate to the interactions between how we launch the executor, and the role of PID 1.
The CommandInfo for the ExecutorInfo uses the default `shell` value which is `true`[1]. This means that in any PID isolated container the `sh` process that launches the executor will become PID 1. Here is an example `ps` output from vagrant showing this: ```` root@aurora:/# ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 250 0.0 0.0 21928 2124 ? Ss 01:19 0:00 /bin/bash root 469 0.0 0.0 19176 1240 ? R+ 01:28 0:00 \_ ps auxf root 1 0.0 0.0 4328 636 ? Ss 01:10 0:00 /bin/sh -c ${MESOS_SANDBOX=.}/thermos_executor.pex --announcer-ensemble localhost:2181 --announcer-zookeeper-auth-config /home/vagrant/aurora/examples/ vagrant/config/announcer-auth.json --mesos-containerizer root 5 0.7 1.4 1201128 45604 ? Sl 01:10 0:08 python2.7 /mnt/mesos/sandbox/thermos_executor.pex --announcer-ensemble localhost:2181 --announcer-zookeeper-auth-config /home/vagrant/aurora/examples/ vagrant/config/announcer-auth.json --mesos-containerizer- root 23 0.1 0.6 115668 20764 ? S 01:10 0:01 \_ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex --task_id=www-data-devel-hello_docker_engine-0-5f443832-a13e-4cde-97e3-89aa905f2487 --log_to_disk=DEBUG --hostname=192.168.33.7 --thermos_js root 29 0.0 0.5 113476 17936 ? Ss 01:10 0:00 \_ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex --task_id=www-data-devel-hello_docker_engine-0-5f443832-a13e-4cde-97e3-89aa905f2487 --log_to_disk=DEBUG --hostname=192.168.33.7 --thermo root 34 0.0 0.0 20040 1476 ? S 01:10 0:00 | \_ /bin/bash -c while true; do echo hello world sleep 10 done root 468 0.0 0.0 4228 348 ? S 01:28 0:00 | \_ sleep 10 root 31 0.0 0.5 113476 17936 ? Ss 01:10 0:00 \_ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex --task_id=www-data-devel-hello_docker_engine-0-5f443832-a13e-4cde-97e3-89aa905f2487 --log_to_disk=DEBUG --hostname=192.168.33.7 --thermo root 32 0.0 0.0 20040 1476 ? S 01:10 0:00 \_ /bin/bash -c while true; do echo hello world sleep 10 done root 467 0.0 0.0 4228 352 ? S 01:28 0:00 \_ sleep 10 root 47 0.0 0.0 24116 3052 ? S 01:10 0:00 python ./daemon.py ```` This means processes that double fork/daemonize will be re parented to `sh` and not our executor. You can see that the `python daemon.py` process has been reparented to `sh` and not the executor and is outside of the scope of the runners. This has a number of undesirable implications, perhaps most concerning is that processes that end up reparenting to PID 1 will not receive SIGTERM or SIGKILL from thermos but instead will be killed by the kernel when thermos decides to to exit. If anyone here decides to run published images that use popular software that double forks (like nginx), you will never be able to ensure the processes die cleanly. I've been thinking about this problem for a while and upon advice from others and my own research I believe the best solution is as follows: 1. We have good reasons for setting `shell=True` when launching the executor. I'm not comfortable changing this because I'm not sure of all of the implications if we choose another method. 2. The thermos runners end up forking off the target processes. I think the runners should be responsible for all of the processes that are created by the children. 3. We can make the runners responsible for their grand children by using `prctl(2)`[2] and setting the `PR_SET_CHILD_SUBREAPER` bit for each runner. This means double forked processes will be reparented to the runner and not PID 1 4. On task tear down, we make the runners send SIGTERM and SIGKILL to the PIDs they recorded and any other children they have. 5. Each runner would need to have a SIGCHLD handler to handle zombie processes that are reparented to it. [1]: https://github.com/apache/aurora/blob/783baaefb9a814ca01fad78181fe3d f3de5b34af/src/main/java/org/apache/aurora/scheduler/configuration/executor/ ExecutorModule.java#L109-L135 [2]: http://man7.org/linux/man-pages/man2/prctl.2.html -- Zameer Manji