[
https://issues.apache.org/jira/browse/MESOS-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696231#comment-13696231
]
Benjamin Mahler commented on MESOS-458:
---------------------------------------
I did some experiments to test this out, and they seem to confirm the article I
linked:
I restricted my test cgroup to run under a single cpu and then set up the
following tree
A with children A1 and A2
B with no children
A has 1024 shares, B has 2048 shares
A1 has 512 shares, A2 has 1024 shares
# Create A and B cgroups, A has sub-cgroups A1 and A2
$ cd /cgroup
$ mkdir test && cd test
$ mkdir A B A/A1 A/A2
# Restrict to 1 cpu
$ echo 1 > cpuset.cpus
$ echo 1 > A/cpuset.cpus
$ echo 1 > A/A1/cpuset.cpus
$ echo 1 > A/A2/cpuset.cpus
$ echo 1 > B/cpuset.cpus
# Seems to be necessary.
$ echo 0-1 > cpuset.mems
$ echo 0-1 > A/cpuset.mems
$ echo 0-1 > A/A1/cpuset.mems
$ echo 0-1 > A/A2/cpuset.mems
$ echo 0-1 > B/cpuset.mems
# Setup the shares: A has 1024, B has 2048, A1 has 512, A2 has 1024
$ echo 1024 > A/cpu.shares
$ echo 2048 > B/cpu.shares
$ echo 1024 > A/A2/cpu.shares
# Saturate A1, A2 and B.
$ yes A1 > /dev/null &
[1] 52741
$ echo $! > A/A1/tasks
$ yes A2 > /dev/null &
[2] 52764
$ echo $! > A/A2/tasks
$ yes B > /dev/null &
[3] 52811
$ echo $! > B/tasks
$ top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52811 root 20 0 58940 544 460 R 66.8 0.0 9:39.91 yes B
52764 root 20 0 58940 544 460 R 23.6 0.0 3:25.65 yes A2
52741 root 20 0 58940 544 460 R 11.8 0.0 1:58.86 yes A1
You can see B gets roughly 2048 / (1024 (A) + 2048 (B)) = 66% and A2 and A1 are
sharing under 33%. A1 then gets 512 / (512(A1) + 1024 (A2)) = 1/3 of the
remaining 33% = 11% and A2 gets 2/3 of the 33% = 22%. Looks good.
# Placing a task in A:
$ yes A > /dev/null &
[4] 56037
$ echo $! > A/tasks
$ top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52811 root 20 0 58940 544 460 R 66.9 0.0 11:02.72 yes B
52764 root 20 0 58940 544 460 R 11.8 0.0 3:52.22 yes A2
56037 root 20 0 58940 544 460 R 11.8 0.0 0:04.90 yes A
52741 root 20 0 58940 544 460 R 5.9 0.0 2:12.17 yes A1
You can see this does not affect B, which still gets 2/3 as a share. Now, A is
sharing along with A1 and A2. A1 is now reduced down to 512 / (1024(A) +
1024(A2) + 512(A1)) = 1/5. 1/5 * 33% = 6.6%. A and A2 share the rest of the 33%
evenly as they both have 1024 shares.
Adding another task to A results in diminished shares for all tasks rooted at
A, as expected:
$ yes A > /dev/null &
[5] 58049
$ echo $! > A/tasks
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52811 root 20 0 58940 544 460 R 66.8 0.0 15:39.56 yes B
52764 root 20 0 58940 544 460 R 9.8 0.0 4:47.39 yes A2
56037 root 20 0 58940 544 460 R 7.9 0.0 1:00.06 yes A
58049 root 20 0 58940 544 460 R 7.9 0.0 0:02.00 yes A
52741 root 20 0 58940 544 460 R 3.9 0.0 2:39.81 yes A1
This is because _each_ process in the parent of a cgroup affects the shares of
the child cgroups (as Ian pointed out).
Finally, if we remove the previous task and instead add a task to A2:
$ sudo kill -9 58049
$ yes A2 > /dev/null &
[5] 59125
$ echo $! > A/A2/tasks
$ top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52811 root 20 0 58940 544 460 R 64.9 0.0 19:36.49 yes B
56037 root 20 0 58940 544 460 R 11.8 0.0 1:38.98 yes A
52741 root 20 0 58940 544 460 R 5.9 0.0 2:59.31 yes A1
52764 root 20 0 58940 544 460 R 5.9 0.0 5:23.27 yes A2
59125 root 20 0 58940 540 460 R 5.9 0.0 0:08.35 yes A2
This time, A1 and A remain unaffected by the additional task in A2.
> Current cgroups layout does not ensure the slave gets a fair share of the CPU
> resources.
> ----------------------------------------------------------------------------------------
>
> Key: MESOS-458
> URL: https://issues.apache.org/jira/browse/MESOS-458
> Project: Mesos
> Issue Type: Improvement
> Reporter: Benjamin Mahler
>
> We currently have the following layout when there is no system cgroup present
> on a machine:
> /cgroup: (system processes, including the mesos-slave) 1024 shares
> /cgroup/mesos: (no processes) 1024 shares
> /cgroup/mesos/executor1: X shares
> ...
> /cgroup/mesos/executorN: X shares
> This does not ensure the slave gets a fair share of the cpu, especially when
> there is load inside the root cgroup. This is because the slave is contending
> with other processes inside the root cgroup. If the administrators set up a
> system cgroup, the layout looks as follows:
> /cgroup: (no processes)
> /cgroup/system: (system processes, including the mesos-slave) 1024 shares
> /cgroup/mesos: (no processes) 1024 shares
> /cgroup/mesos/executor1: X shares
> ...
> /cgroup/mesos/executorN: X shares
> This still does not ensure the slave gets a fair share for the same reasons.
> However, if we create a cgroup to hold only the slave:
> /cgroup: (no processes)
> /cgroup/system: (system processes) 1024 shares
> /cgroup/mesos-slave: (mesos-slave process) 1024 shares
> /cgroup/mesos: (no processes) 1024 shares
> /cgroup/mesos/executor1: X shares
> ...
> /cgroup/mesos/executorN: X shares
> With the above configuration, the slave will get 1024 / (1024 + 1024 + 1024)
> = 1/3 of the CPUs, system processes will similarly get 1/3 of the CPUs, and
> the executors will get 1/3 as well in total.
> See the following link for sharing behavior in the root cgroup:
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/process_behavior.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira