Andrei Budnik created MESOS-9268:
------------------------------------

             Summary: Hitting agent's `/containers` endpoint might backlog 
Docker containerizer process.
                 Key: MESOS-9268
                 URL: https://issues.apache.org/jira/browse/MESOS-9268
             Project: Mesos
          Issue Type: Bug
          Components: agent, containerization, docker
            Reporter: Andrei Budnik


When the agent's `/containers` endpoint is hit, the agent calls 
`DockerContainerizerProcess::usage()` method for every running Docker 
container. If there are lots (hundreds) of running Docker containers and the 
system is under heavy load, then `DockerContainerizerProcess::usage()` method 
takes a lot of time to be processed. Hence, when the Docker containerizer's 
mailbox is full of `DockerContainerizerProcess::usage()` commands waiting to be 
processed, an attempt to launch a new Docker container will lead to putting 
`DockerContainerizer::launch()` in a long queue. This is an example from logs:
{code:java}
2018-09-07 00:06:03: I0907 00:06:03.031744 65329 slave.cpp:2857] Launching 
container 320547f2-9ab9-4ade-bddb-bfd6f5ed0834 for executor 
'example_test2519.ceeefb8a-b231-11e8-8d62-8e820bbfbfd1' of framework 
51b197db-56ae-47dc-a1bd-3aaf2306bd1a-0001
...
2018-09-07 00:36:20: I0907 00:36:20.472048 65307 docker.cpp:1179] Starting 
container '320547f2-9ab9-4ade-bddb-bfd6f5ed0834' for task 
'example_test2519.ceeefb8a-b231-11e8-8d62-8e820bbfbfd1' (and executor 
'middleware_test2519.ceeefb8a-b231-11e8-8d62-8e820bbfbfd1') of framework 
51b197db-56ae-47dc-a1bd-3aaf2306bd1a-0001
{code}

After disabling `/containers` endpoint, the issue with a backlogged Docker 
containerizer disappears.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to