[
https://issues.apache.org/jira/browse/MESOS-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406116#comment-16406116
]
Alexander Rukletsov commented on MESOS-8572:
--------------------------------------------
[~brat002] reports a very similar issue. Below is my loose translation of the
message he sent over private channels.
"Sometimes docker task does not finish correctly on {{docker stop}}. For
example in https://pastebin.com/NwgA7d7M, {{docker stop}} hung 10 days (!).
Manually issued {{docker stop}} from terminal hangs for 20-30 seconds and then
exits cleanly, but does not stop the container. However, if {{kill -9}} is sent
to the corresponding {{mesos-docker-executor}}, the whole process tree
terminates correctly and {{docker ps}} does not list the container any more.
The hypothesis is that docker cannot terminate a container while someone is
listening to its stdin/stderr. Hence it might make sense to send {{SIGTERM}}
followed by {{SIGKILL}} instead of retrying {{docker stop}}."
> Make Docker executor/containerizer resilient to Docker daemon failures.
> -----------------------------------------------------------------------
>
> Key: MESOS-8572
> URL: https://issues.apache.org/jira/browse/MESOS-8572
> Project: Mesos
> Issue Type: Epic
> Components: containerization, docker, executor
> Affects Versions: 1.5.0
> Reporter: Greg Mann
> Assignee: Greg Mann
> Priority: Major
> Labels: mesosphere
>
> Experience has shown that the Docker CLI can hang indefinitely at times.
> There are many variations of this behavior, and it occurs across many
> versions of Docker. For these reasons, and since many users of Mesos still
> make heavy use of the Docker containerizer and the Docker executor, it will
> improve the user experience to make the Docker containerizer/executor
> resilient to such Docker daemon failures.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)