Greg Mann created MESOS-8538:
--------------------------------

             Summary: Consider adding a timeout to Docker executor task launch
                 Key: MESOS-8538
                 URL: https://issues.apache.org/jira/browse/MESOS-8538
             Project: Mesos
          Issue Type: Improvement
            Reporter: Greg Mann


In order to be more resilient to an unresponsive Docker daemon on an agent, the 
Docker executor could utilize a timeout for its task launches. If its initial 
{{docker inspect}} call fails to return within this timeout, the executor could 
commit suicide.

However, we must be careful to properly clean up in such a case. For example, 
if the executor's {{docker run}} command was successful, but then {{docker 
inspect}} failed to return, we would want to be sure that the Docker 
containerizer would destroy the running container in this case. Furthermore, 
it's possible that it could lead to a state where the executor terminates, then 
a TASK_FAILED is forwarded to the master, but the task container continues to 
run on the agent until the daemon becomes responsive again. If a launch timeout 
is implemented, care should be taken to avoid such inconsistent states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to