Andrei Budnik created MESOS-9230:
------------------------------------
Summary: Docker executor may stuck in infinite loop when `docker
run` hangs.
Key: MESOS-9230
URL: https://issues.apache.org/jira/browse/MESOS-9230
Project: Mesos
Issue Type: Bug
Components: docker, executor
Affects Versions: 1.6.0, 1.5.1, 1.4.2, 1.2.3
Reporter: Andrei Budnik
This issue happens due to a very slow/unresponsive Docker daemon.
Observed behaviour of the Docker executor:
# Agent launches the Docker executor, which calls `docker run` to launch a
container.
# `docker inspect` hangs each time it's called, so the docker executor
[retries in a
loop|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L244-L275]
without success.
# After 5 minutes, a framework (Marathon) sends first `killTask` message,
which
[interrupts|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L543-L550]
the previous `docker inspect` loop.
# Then, `killTask()` launches the very first `docker stop`, which hangs.
# The framework sends the second `killTask()` after 20 seconds which
[interrupts|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L599-L607]
the first `docker stop` command.
# The framework continues to send `killTask()` every 20 seconds, but `docker
stop` always immediately returns an error: "Error response from daemon: No such
container: mesos-some-UID".
Since `docker run`
[hangs|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L242],
`reaped()`
[callback|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L664-L693]
is never called. Thus, the Docker executor gets stuck in an infinite `docker
stop` loop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)