[ 
https://issues.apache.org/jira/browse/MESOS-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629150#comment-16629150
 ] 

Andrei Budnik commented on MESOS-9268:
--------------------------------------

I launched 239 docker containers running `sleep 1000` command on Fedora 25 (16 
CPUs, 64 Gb of memory):
{code:java}
time curl `hostname`:5051/containers
...
real    0m21.586s
user    0m0.003s
sys     0m0.008s{code}
Hitting `/containers` endpoint takes ~21 seconds. Note, that this machine was 
not under heavy load.

After I removed the code that [accesses 
Sysfs|https://github.com/apache/mesos/blob/1.7.x/src/slave/containerizer/docker.cpp#L2010-L2029]
 and then launched 210 Docker containers, `/containers` endpoint started to 
respond very quickly:
{code:java}
time curl `hostname`:5051/containers
...
real    0m0.082s
user    0m0.004s
sys     0m0.004s{code}
`DockerContainerizerProcess::usage()` is called for every container when 
`/containers` endpoint is hit. It calls `docker inspect` and then collects some 
cgroup statistics (by accessing Sysfs). `DockerContainerizerProcess::usage()` 
spent most of the time in collecting cgroup statistics in aforementioned 
experiment.

 

> Hitting agent's `/containers` endpoint might backlog Docker containerizer 
> process.
> ----------------------------------------------------------------------------------
>
>                 Key: MESOS-9268
>                 URL: https://issues.apache.org/jira/browse/MESOS-9268
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, containerization, docker
>            Reporter: Andrei Budnik
>            Priority: Major
>
> When the agent's `/containers` endpoint is hit, the agent calls 
> `DockerContainerizerProcess::usage()` method for every running Docker 
> container. If there are lots (hundreds) of running Docker containers and the 
> system is under heavy load, then `DockerContainerizerProcess::usage()` method 
> takes a lot of time to be processed. Hence, when the Docker containerizer's 
> mailbox is full of `DockerContainerizerProcess::usage()` commands waiting to 
> be processed, an attempt to launch a new Docker container will lead to 
> putting `DockerContainerizer::launch()` into a long queue. This is an example 
> from logs:
> {code:java}
> 2018-09-07 00:06:03: I0907 00:06:03.031744 65329 slave.cpp:2857] Launching 
> container 320547f2-9ab9-4ade-bddb-bfd6f5ed0834 for executor 
> 'example_test2519.ceeefb8a-b231-11e8-8d62-8e820bbfbfd1' of framework 
> 51b197db-56ae-47dc-a1bd-3aaf2306bd1a-0001
> ...
> 2018-09-07 00:36:20: I0907 00:36:20.472048 65307 docker.cpp:1179] Starting 
> container '320547f2-9ab9-4ade-bddb-bfd6f5ed0834' for task 
> 'example_test2519.ceeefb8a-b231-11e8-8d62-8e820bbfbfd1' (and executor 
> 'middleware_test2519.ceeefb8a-b231-11e8-8d62-8e820bbfbfd1') of framework 
> 51b197db-56ae-47dc-a1bd-3aaf2306bd1a-0001
> {code}
> After disabling `/containers` endpoint, the issue with a backlogged Docker 
> containerizer disappears.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to