[
https://issues.apache.org/jira/browse/MESOS-8877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462217#comment-16462217
]
Qian Zhang edited comment on MESOS-8877 at 5/3/18 9:54 AM:
-----------------------------------------------------------
The root cause of this issue, when we recover a container in
`DockerContainerizerProcess::_recover`, the resources of the container is NOT
set (see
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/docker.cpp#L1013:L1051]
for details), this will cause when the Docker executor reregisters with agent,
`DockerContainerizerProcess::__update` will be called to update the resources
of the Docker container in cgroups because the result of [this
check|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/docker.cpp#L1652]
is not true, and the updated resources include both the Docker container's
resources and the Docker executor's resources (0.1 cpus and 32 MB memory).
That's why we see the Docker container's resources in cgroups are enlarged by
0.1 cpus and 32 MB memory after agent recovery.
We do not have this issue when launching Docker container, because its
resources will be set (see
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/docker.hpp#L343]
for details), and it contains both the Docker container's resources and the
Docker executor's resources, so the result of [this
check|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/docker.cpp#L1652]
will be true which will cause `DockerContainerizerProcess::__update` will not
be called.
was (Author: qianzhang):
The root cause of this issue, when we recover a container in
`DockerContainerizerProcess::_recover`, the resources of the container is NOT
set (see
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/docker.cpp#L1013:L1051]
for details), this will cause when the Docker executor reregisters with agent,
`DockerContainerizerProcess::__update` will be called to update the resources
of the Docker container in cgroups because the result of [this
check|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/docker.cpp#L1652]
is not true, and the updated resources include both the Docker container's
resources and the Docker executor's resources (0.1 cpus and 32 MB memory).
That's why we see the Docker container's resources in cgroups are enlarged by
0.1 cpus and 32 MB memory.
We do not have this issue when launching Docker container, because its
resources will be set (see
[here|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/docker.hpp#L343]
for details), and it contains both the Docker container's resources and the
Docker executor's resources, so the result of [this
check|https://github.com/apache/mesos/blob/1.5.0/src/slave/containerizer/docker.cpp#L1652]
will be true which will cause `DockerContainerizerProcess::__update` will not
be called.
> Docker container's resources will be wrongly enlarged in cgroups after agent
> recovery
> -------------------------------------------------------------------------------------
>
> Key: MESOS-8877
> URL: https://issues.apache.org/jira/browse/MESOS-8877
> Project: Mesos
> Issue Type: Bug
> Components: docker
> Reporter: Qian Zhang
> Priority: Major
>
> Reproduce steps:
> 1. Run `mesos-execute --master=10.0.49.2:5050
> --task=[file:///home/qzhang/workspace/config/task_docker.json]
> --checkpoint=true` to launch a Docker container.
> {code:java}
> # cat task_docker.json
> {
> "name": "test",
> "task_id": {"value" : "test"},
> "agent_id": {"value" : ""},
> "resources": [
> {"name": "cpus", "type": "SCALAR", "scalar": {"value": 0.1}},
> {"name": "mem", "type": "SCALAR", "scalar": {"value": 32}}
> ],
> "command": {
> "value": "sleep 55555"
> },
> "container": {
> "type": "DOCKER",
> "docker": {
> "image": "alpine"
> }
> }
> }
> {code}
> 2. When the Docker container is running, we can see its resources in cgroups
> are correctly set, so far so good.
> {code:java}
> # cat
> /sys/fs/cgroup/cpu,cpuacct/docker/a711b3c7b0d91cd6d1c7d8daf45a90ff78d2fd66973e615faca55a717ec6b106/cpu.cfs_quota_us
>
> 10000
> # cat
> /sys/fs/cgroup/memory/docker/a711b3c7b0d91cd6d1c7d8daf45a90ff78d2fd66973e615faca55a717ec6b106/memory.limit_in_bytes
>
> 33554432
> {code}
> 3. Restart Mesos agent, and then we will see the resources of the Docker
> container will be wrongly enlarged.
> {code}
> I0503 02:06:17.268340 29512 docker.cpp:1855] Updated 'cpu.shares' to 204 at
> /sys/fs/cgroup/cpu,cpuacct/docker/a711b3c7b0d91cd6d1c7d8daf45a90ff78d2fd66973e615faca55a717ec6b106
> for container 1b21295b-2f49-4d08-84c7-43b9ae15ad88
> I0503 02:06:17.271390 29512 docker.cpp:1882] Updated 'cpu.cfs_period_us' to
> 100ms and 'cpu.cfs_quota_us' to 20ms (cpus 0.2) for container
> 1b21295b-2f49-4d08-84c7-43b9ae15ad88
> I0503 02:06:17.273082 29512 docker.cpp:1924] Updated
> 'memory.soft_limit_in_bytes' to 64MB for container
> 1b21295b-2f49-4d08-84c7-43b9ae15ad88
> I0503 02:06:17.275908 29512 docker.cpp:1950] Updated 'memory.limit_in_bytes'
> to 64MB at
> /sys/fs/cgroup/memory/docker/a711b3c7b0d91cd6d1c7d8daf45a90ff78d2fd66973e615faca55a717ec6b106
> for container 1b21295b-2f49-4d08-84c7-43b9ae15ad88
> # cat
> /sys/fs/cgroup/cpu,cpuacct/docker/a711b3c7b0d91cd6d1c7d8daf45a90ff78d2fd66973e615faca55a717ec6b106/cpu.cfs_quota_us
> 20000
> # cat
> /sys/fs/cgroup/memory/docker/a711b3c7b0d91cd6d1c7d8daf45a90ff78d2fd66973e615faca55a717ec6b106/memory.limit_in_bytes
> 67108864
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)