Greg Mann created MESOS-8488:
--------------------------------

             Summary: Docker bug can cause unkillable tasks
                 Key: MESOS-8488
                 URL: https://issues.apache.org/jira/browse/MESOS-8488
             Project: Mesos
          Issue Type: Improvement
    Affects Versions: 1.5.0
            Reporter: Greg Mann


Due to an [issue on the Moby 
project|https://github.com/moby/moby/issues/33820], it's possible for Docker 
versions 1.13 and later to fail to catch a container exit, so that the {{docker 
run}} command which was used to launch the container will never return. This 
can lead to the Docker executor becoming stuck in a state where it believes the 
container is still running and cannot be killed.

We should update the Docker executor to ensure that containers stuck in such a 
state cannot cause unkillable Docker executors/tasks.

One way to do this would be a timeout, after which the Docker executor will 
commit suicide if a kill task attempt has not succeeded. However, if we do this 
we should also ensure that in the case that the container was actually still 
running, either the Docker daemon or the DockerContainerizer would clean up the 
container when it does exit.

Another option might be for the Docker executor to directly {{wait()}} on the 
container's Linux PID, in order to notice when the container exits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to