[
https://issues.apache.org/jira/browse/MESOS-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648362#comment-16648362
]
Greg Mann commented on MESOS-9283:
----------------------------------
Backports, back to 1.5.x. Will evaluate the feasibility of a 1.4.x backport
shortly.
1.7.x:
{code}
commit e7b49418bbd5418f48e7bbb3f9b38bd51ff06dc4
Author: Greg Mann <[email protected]>
Date: Wed Oct 3 18:10:23 2018 -0700
Updated Docker library to avoid 'os::killtree()' when discarding.
Review: https://reviews.apache.org/r/68923
{code}
1.6.x:
{code}
commit 73d17aac82cb0b02f589eee78bcb146133616e96
Author: Greg Mann <[email protected]>
Date: Wed Oct 3 18:10:23 2018 -0700
Updated Docker library to avoid 'os::killtree()' when discarding.
Review: https://reviews.apache.org/r/68923
{code}
1.5.x:
{code}
commit 317d1dc8cef99260e8182b0eef91761688cf4edf
Author: Greg Mann <[email protected]>
Date: Wed Oct 3 18:10:23 2018 -0700
Updated Docker library to avoid 'os::killtree()' when discarding.
Review: https://reviews.apache.org/r/68923
{code}
> Docker containerizer actor can get backlogged with large number of containers.
> ------------------------------------------------------------------------------
>
> Key: MESOS-9283
> URL: https://issues.apache.org/jira/browse/MESOS-9283
> Project: Mesos
> Issue Type: Bug
> Components: containerization
> Affects Versions: 1.4.2, 1.5.1, 1.6.1, 1.7.0
> Reporter: Jie Yu
> Assignee: Greg Mann
> Priority: Blocker
> Labels: perfomance
> Fix For: 1.8.0
>
> Attachments: Screen Shot 2018-10-01 at 10.54.18 PM.png
>
>
> We observed during some scale testing that we do internally.
> When launching 300+ Docker containers on a single agent box, it's possible
> that the Docker containerizer actor gets backlogged. As a result, API
> processing like `GET_CONTAINERS` will become unresponsive. It'll also block
> Mesos containerizer from launching containers if one specified
> `--containers=docker,mesos` because Docker containerizer launch will be
> invoked first by the composing containerizer (and queued).
> Profiling results show that the bottleneck is `os::killtree`, which will be
> invoked when the Docker commands are discarded (e.g., client disconnect,
> etc.).
> For this particular case, killtree is not really necessary because the docker
> command does not fork additional subprocesses. If we use the argv version of
> `subprocess` to launch docker commands, we can simply use os::kill instead.
> We confirmed that, by switching to os::kill, the performance issues goes
> away, and the agent can easily scale up to 300+ containers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)