[
https://issues.apache.org/jira/browse/MESOS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536807#comment-14536807
]
Ian Babrou commented on MESOS-2713:
-----------------------------------
I'm not sure if you could run docker containers without cgroups. Anyway,
graceful fallback to existing stats instead of cgroups would be better.
Take a look:
web300 ~ # cat
/sys/fs/cgroup/cpuacct/docker/944fe900f60595d37ce4db3c4c09c196be3b500c2d3e89dab59351da2c8b597d/cpuacct.stat
user 20964
system 1167
web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq .
[
{
"statistics": {
"timestamp": 1431194945.15193,
"mem_rss_bytes": 408150016,
"mem_limit_bytes": 2181038080,
"cpus_user_time_secs": 1.46,
"cpus_system_time_secs": 0.35,
"cpus_limit": 3.6
},
"source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799",
"framework_id": "20150126-100650-3909200064-5050-1-0007",
"executor_name": "Command Executor (Task:
topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c
'exec /sbin/m...')",
"executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799"
}
]
Now take another look, user time decreases:
web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq .
[
{
"statistics": {
"timestamp": 1431195057.42133,
"mem_rss_bytes": 428085248,
"mem_limit_bytes": 2181038080,
"cpus_user_time_secs": 4.56,
"cpus_system_time_secs": 0.43,
"cpus_limit": 3.6
},
"source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799",
"framework_id": "20150126-100650-3909200064-5050-1-0007",
"executor_name": "Command Executor (Task:
topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c
'exec /sbin/m...')",
"executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799"
}
]
web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq .
[
{
"statistics": {
"timestamp": 1431195058.38549,
"mem_rss_bytes": 335261696,
"mem_limit_bytes": 2181038080,
"cpus_user_time_secs": 0.73,
"cpus_system_time_secs": 0.31,
"cpus_limit": 3.6
},
"source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799",
"framework_id": "20150126-100650-3909200064-5050-1-0007",
"executor_name": "Command Executor (Task:
topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c
'exec /sbin/m...')",
"executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799"
}
]
> Docker resource usage
> ----------------------
>
> Key: MESOS-2713
> URL: https://issues.apache.org/jira/browse/MESOS-2713
> Project: Mesos
> Issue Type: Bug
> Components: containerization, docker, isolation
> Affects Versions: 0.22.1
> Reporter: Ian Babrou
>
> Looks like resource usage for docker containers on slaves is not very
> accurate (/monitor/statistics.json). For example, cpu usage is calculated by
> travesing process tree and summing up cpu times. Resulting numbers are not
> even close to real usage, CPU time can even decrease.
> What is the reason for this if you can use cgroup data directly? Reading
> cgroup location from pid of docker container is pretty straighforward.
> Another similar question: what is the reason to set isolation to posix
> instead of cgroups by default? Looks like it suffers from the same issues as
> docker containerizer (incorrect stats). More docs on this topic would be
> great.
> Posix isolation also leads to bigger CPU usage from mesos slave process
> (higher usage — posix isolation): http://i.imgur.com/jepk5m6.png
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)