[ 
https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622804#comment-14622804
 ] 

Jie Yu edited comment on MESOS-2652 at 7/10/15 8:38 PM:
--------------------------------------------------------

Re-open this ticket because we observed that setting processes scheduler policy 
to SCHED_IDLE does not work as expected when cgroups are used.

Here is my testing environment:

(1) I used a widely used open source cpu benchmark for multi-processors, called 
Parsec (http://parsec.cs.princeton.edu/), to test cpu performance. The idea is 
to launch a job (using Aurora) with each instance continuously running Parsec 
benchmark and reporting statistics.

(2) Each instance of the job uses 16 threads (by configuring the Parsec). Each 
instance of the job is scheduled on a box with 16 cores. That means no other 
regular job can land on those boxes.

(3) Use a fixed resource estimator on each slave and launch revocable tasks 
using no_executor_framework. Each revocable task simply does a 'while(true)' 
burning cpus.

There is one interesting observation: one instance of the benchmark job lands 
on a slave that happens to have 11 revocable tasks running (each uses 1 
revocable cpu). All other slaves all have 8 revocable tasks running. And that 
instance of the benchmark job performs consistently worse than other instances. 
However, after I killed the 3 extra revocable tasks, the performance improves 
immediately and matches that of other instances. See the attached result. 
Notice that the y-axis is the execution time.

Later, I did another experiment. I set the cpu.share of the cgroup in which 
each revocable task is to be 2 (the minimal) instead of 1024 (which is the 
default share for 1 cpu) on one slave. The performance of the benchmark 
instance on that slave improves immediately. See the attached result.

[~wangcong] found out that SCHED_IDLE is only meaningful within a cgroup. In 
other words, setting the scheduler policy of a process in a cgroup to be 
SCEHD_IDLE will prevent this process from getting CPUs if there are processes 
runnable on SCHED_OTHER in that cgroup. However, between cgroups, the CPUs are 
shared according to cpu.share of each cgroup.


was (Author: jieyu):
Re-open this ticket because we observed that setting processes scheduler policy 
to SCHED_IDLE does not work as expected when cgroups are used.

Here is my testing environment:

(1) I used a widely used open source cpu benchmark for multi-processors, called 
Parsec (http://parsec.cs.princeton.edu/), to test cpu performance. The idea is 
to launch a job (using Aurora) with each instance continuously running Parsec 
benchmark and reporting statistics.

(2) Each instance of the job uses 16 threads (by configuring the Parsec). Each 
instance of the job is scheduled on a box with 16 cores. That means no other 
regular job can land on those boxes.

(3) Use a fixed resource estimator on each slave and launch revocable tasks 
using no_executor_framework. Each revocable task simply does a 'while(true)' 
burning cpus.

There is one interesting observation: one instance of the benchmark job lands 
on a slave that happens to have 11 revocable tasks running (each uses 1 
revocable cpu). All other slaves all have 8 revocable tasks running. And that 
instance of the benchmark job performs consistently worse than other instances. 
However, after I killed the 3 extra revocable tasks, the performance improves 
immediately and matches that of other instances. See the attached result. 
Notice that the y-axis is the execution time.

Later, I did another experiment. I set the cpu.share of the cgroup in which 
each revocable task is to be 2 (the minimal) instead of 1024 (which is the 
default share for 1 cpu) on one slave. The performance of the benchmark 
instance on that slave improves immediately. See the attached result.

[~wangcong] found out that SCHED_IDLE is only meaningful within a cgroup. In 
other words, setting the scheduler policy of a process in a cgroup to be 
SCEHD_IDLE will prevent this process from getting CPUs if there are processes 
runnable on SCHED_OTHER. However, between cgroups, the CPUs are shared 
according to cpu.share of each cgroup.

> Update Mesos containerizer to understand revocable cpu resources
> ----------------------------------------------------------------
>
>                 Key: MESOS-2652
>                 URL: https://issues.apache.org/jira/browse/MESOS-2652
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Vinod Kone
>            Assignee: Ian Downes
>              Labels: twitter
>             Fix For: 0.23.0
>
>         Attachments: Abnormal performance with 3 additional revocable tasks 
> (1).png, Abnormal performance with 3 additional revocable tasks (2).png, 
> Abnormal performance with 3 additional revocable tasks (3).png, Abnormal 
> performance with 3 additional revocable tasks (4).png, Abnormal performance 
> with 3 additional revocable tasks (5).png, Abnormal performance with 3 
> additional revocable tasks (6).png, Abnormal performance with 3 additional 
> revocable tasks (7).png, Performance improvement after reducing cpu.share to 
> 2 for revocable tasks (1).png, Performance improvement after reducing 
> cpu.share to 2 for revocable tasks (10).png, Performance improvement after 
> reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement 
> after reducing cpu.share to 2 for revocable tasks (3).png, Performance 
> improvement after reducing cpu.share to 2 for revocable tasks (4).png, 
> Performance improvement after reducing cpu.share to 2 for revocable tasks 
> (5).png, Performance improvement after reducing cpu.share to 2 for revocable 
> tasks (6).png, Performance improvement after reducing cpu.share to 2 for 
> revocable tasks (7).png, Performance improvement after reducing cpu.share to 
> 2 for revocable tasks (8).png, Performance improvement after reducing 
> cpu.share to 2 for revocable tasks (9).png
>
>
> The CPU isolator needs to properly set limits for revocable and non-revocable 
> containers.
> The proposed strategy is to use a two-way split of the cpu cgroup hierarchy 
> -- normal (non-revocable) and low priority (revocable) subtrees -- and to use 
> a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split 
> (TBD). Containers would be present in only one of the subtrees. CFS quotas 
> will *not* be set on subtree roots, only cpu.shares. Each container would set 
> CFS quota and shares as done currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to