[
https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gilbert Song updated MESOS-8488:
--------------------------------
Component/s: containerization
> Docker bug can cause unkillable tasks
> -------------------------------------
>
> Key: MESOS-8488
> URL: https://issues.apache.org/jira/browse/MESOS-8488
> Project: Mesos
> Issue Type: Improvement
> Components: containerization
> Affects Versions: 1.5.0
> Reporter: Greg Mann
> Priority: Major
> Labels: mesosphere
>
> Due to an [issue on the Moby
> project|https://github.com/moby/moby/issues/33820], it's possible for Docker
> versions 1.13 and later to fail to catch a container exit, so that the
> {{docker run}} command which was used to launch the container will never
> return. This can lead to the Docker executor becoming stuck in a state where
> it believes the container is still running and cannot be killed.
> We should update the Docker executor to ensure that containers stuck in such
> a state cannot cause unkillable Docker executors/tasks.
> One way to do this would be a timeout, after which the Docker executor will
> commit suicide if a kill task attempt has not succeeded. However, if we do
> this we should also ensure that in the case that the container was actually
> still running, either the Docker daemon or the DockerContainerizer would
> clean up the container when it does exit.
> Another option might be for the Docker executor to directly {{wait()}} on the
> container's Linux PID, in order to notice when the container exits.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)