Chun-Hung Hsiao created MESOS-8480:
--------------------------------------
Summary: Mesos returns high resource usage when killing a Docker
task.
Key: MESOS-8480
URL: https://issues.apache.org/jira/browse/MESOS-8480
Project: Mesos
Issue Type: Bug
Components: cgroups
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao
The way we get resource statistics for Docker tasks is through getting the
cgroup subsystem path through {{/proc/<pid>/docker}} first (taking the
{{cpuacct}} subsystem as an example):
{noformat}
9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b
{noformat}
Then read
{{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}}
to get the statistics:
{noformat}
user 4
system 0
{noformat}
However, when a Docker container is being teared down, it seems that Docker or
the operation system will first move the process to the root cgroup before
actually killing it, making {{/proc/<pid>/docker}} look like the following:
{noformat}
9:cpuacct,cpu:/
{noformat}
This makes
[{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935]
return a single '/', which in turn makes
[{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991]
read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the statistics
for the root cgroup:
{noformat}
user 228058750
system 24506461
{noformat}
This can be reproduced through test.cpp with the following command:
{noformat}
$ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect
sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep
...
Reading file '/proc/44224/cgroup'
Reading file
'/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat'
user 4
system 0
Reading file '/proc/44224/cgroup'
Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
user 228058750
system 24506461
Reading file '/proc/44224/cgroup'
Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
user 228058750
system 24506461
Failed to open file '/proc/44224/cgroup'
sleep
[2]- Exit 1 ./test $(docker inspect sleep | jq .[].State.Pid)
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)