[ 
https://issues.apache.org/jira/browse/MESOS-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696231#comment-13696231
 ] 

Benjamin Mahler commented on MESOS-458:
---------------------------------------

I did some experiments to test this out, and they seem to confirm the article I 
linked:

I restricted my test cgroup to run under a single cpu and then set up the 
following tree
A with children A1 and A2
B with no children

A has 1024 shares, B has 2048 shares
A1 has 512 shares, A2 has 1024 shares

# Create A and B cgroups, A has sub-cgroups A1 and A2
$ cd /cgroup
$ mkdir test && cd test
$ mkdir A B A/A1 A/A2

# Restrict to 1 cpu
$ echo 1 > cpuset.cpus
$ echo 1 > A/cpuset.cpus
$ echo 1 > A/A1/cpuset.cpus
$ echo 1 > A/A2/cpuset.cpus
$ echo 1 > B/cpuset.cpus

# Seems to be necessary.
$ echo 0-1 > cpuset.mems
$ echo 0-1 > A/cpuset.mems
$ echo 0-1 > A/A1/cpuset.mems
$ echo 0-1 > A/A2/cpuset.mems
$ echo 0-1 > B/cpuset.mems

# Setup the shares: A has 1024, B has 2048, A1 has 512, A2 has 1024
$ echo 1024 > A/cpu.shares
$ echo 2048 > B/cpu.shares
$ echo 1024 > A/A2/cpu.shares

# Saturate A1, A2 and B.
$ yes A1 > /dev/null &
[1] 52741
$ echo $! > A/A1/tasks
$ yes A2 > /dev/null &
[2] 52764
$ echo $! > A/A2/tasks
$ yes B > /dev/null &
[3] 52811
$ echo $! > B/tasks

$ top -c
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
52811 root      20   0 58940  544  460 R 66.8  0.0   9:39.91 yes B
52764 root      20   0 58940  544  460 R 23.6  0.0   3:25.65 yes A2
52741 root      20   0 58940  544  460 R 11.8  0.0   1:58.86 yes A1

You can see B gets roughly 2048 / (1024 (A) + 2048 (B)) = 66% and A2 and A1 are 
sharing under 33%. A1 then gets 512 / (512(A1) + 1024 (A2)) = 1/3 of the 
remaining 33% = 11% and A2 gets 2/3 of the 33% = 22%. Looks good.

# Placing a task in A:
$ yes A > /dev/null &
[4] 56037
$ echo $! > A/tasks
$ top -c
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
52811 root      20   0 58940  544  460 R 66.9  0.0  11:02.72 yes B
52764 root      20   0 58940  544  460 R 11.8  0.0   3:52.22 yes A2
56037 root      20   0 58940  544  460 R 11.8  0.0   0:04.90 yes A
52741 root      20   0 58940  544  460 R  5.9  0.0   2:12.17 yes A1

You can see this does not affect B, which still gets 2/3 as a share. Now, A is 
sharing along with A1 and A2. A1 is now reduced down to 512 / (1024(A) + 
1024(A2) + 512(A1)) = 1/5. 1/5 * 33% = 6.6%. A and A2 share the rest of the 33% 
evenly as they both have 1024 shares.

Adding another task to A results in diminished shares for all tasks rooted at 
A, as expected:
$ yes A > /dev/null &
[5] 58049
$ echo $! > A/tasks
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
52811 root      20   0 58940  544  460 R 66.8  0.0  15:39.56 yes B
52764 root      20   0 58940  544  460 R  9.8  0.0   4:47.39 yes A2
56037 root      20   0 58940  544  460 R  7.9  0.0   1:00.06 yes A
58049 root      20   0 58940  544  460 R  7.9  0.0   0:02.00 yes A
52741 root      20   0 58940  544  460 R  3.9  0.0   2:39.81 yes A1

This is because _each_ process in the parent of a cgroup affects the shares of 
the child cgroups (as Ian pointed out).

Finally, if we remove the previous task and instead add a task to A2:
$ sudo kill -9 58049
$ yes A2 > /dev/null &
[5] 59125
$ echo $! > A/A2/tasks

$ top -c
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
52811 root      20   0 58940  544  460 R 64.9  0.0  19:36.49 yes B
56037 root      20   0 58940  544  460 R 11.8  0.0   1:38.98 yes A
52741 root      20   0 58940  544  460 R  5.9  0.0   2:59.31 yes A1
52764 root      20   0 58940  544  460 R  5.9  0.0   5:23.27 yes A2
59125 root      20   0 58940  540  460 R  5.9  0.0   0:08.35 yes A2

This time, A1 and A remain unaffected by the additional task in A2.
                
> Current cgroups layout does not ensure the slave gets a fair share of the CPU 
> resources.
> ----------------------------------------------------------------------------------------
>
>                 Key: MESOS-458
>                 URL: https://issues.apache.org/jira/browse/MESOS-458
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Benjamin Mahler
>
> We currently have the following layout when there is no system cgroup present 
> on a machine:
> /cgroup: (system processes, including the mesos-slave) 1024 shares
> /cgroup/mesos: (no processes) 1024 shares
> /cgroup/mesos/executor1: X shares
> ...
> /cgroup/mesos/executorN: X shares
> This does not ensure the slave gets a fair share of the cpu, especially when 
> there is load inside the root cgroup. This is because the slave is contending 
> with other processes inside the root cgroup. If the administrators set up a 
> system cgroup, the layout looks as follows:
> /cgroup: (no processes)
> /cgroup/system: (system processes, including the mesos-slave) 1024 shares
> /cgroup/mesos: (no processes) 1024 shares
> /cgroup/mesos/executor1: X shares
> ...
> /cgroup/mesos/executorN: X shares
> This still does not ensure the slave gets a fair share for the same reasons.
> However, if we create a cgroup to hold only the slave:
> /cgroup: (no processes)
> /cgroup/system: (system processes) 1024 shares
> /cgroup/mesos-slave: (mesos-slave process) 1024 shares
> /cgroup/mesos: (no processes) 1024 shares
> /cgroup/mesos/executor1: X shares
> ...
> /cgroup/mesos/executorN: X shares
> With the above configuration, the slave will get 1024 / (1024 + 1024 + 1024) 
> = 1/3 of the CPUs, system processes will similarly get 1/3 of the CPUs, and 
> the executors will get 1/3 as well in total.
> See the following link for sharing behavior in the root cgroup: 
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/process_behavior.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to