[
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839421#comment-15839421
]
haosdent commented on MESOS-6933:
---------------------------------
[~klueska][~janisz] This is {{sh}} problem rather than Mesos bug, because
{{/bin/sh}} doesn't forward signals to any child processes.
Docker has similar problem when you try to exit gracefully if you use {{sh}} to
launch commands, refer to
https://www.ctl.io/developers/blog/post/gracefully-stopping-docker-containers/
for the details.
So the correct way to implement exit gracefully in Docker, Mesos and other
applications is to avoid use {{sh}}. More precisely, user should set
{{CommandInfo.shell}} to false and use {{exec}} form to launch tasks if they
would like to make task exit gracefully. Make sense?
> Executor does not respect grace period
> --------------------------------------
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
> Issue Type: Bug
> Components: executor
> Reporter: Tomasz Janiszewski
>
> Mesos Command Executor try to support grace period with escalate but
> unfortunately it does not work. It launches {{command}} by wrapping it in
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so
> ./bin/offer-i18n -e prod -p $PORT0
> \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped
> {{command}} might need some more time to finish. Finally, executor thinks
> command executed gracefully so it won't
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
> to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores
> SIGTERM it will be attached to initialize and never get killed. Using
> pid/namespace only masks the problem because hanging process is captured
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit
> when all children processes finish. If not they will be killed by escalation
> to SIGKILL.
> All versions from 0.20 are affected.
> This test should pass
> [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
> [Mailing list
> thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)