[ 
https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630351#comment-14630351
 ] 

Jie Yu commented on MESOS-2652:
-------------------------------

I did a few more experiments using the Parsec based CPU benchmark to further 
quantify the interferences from revocable tasks.

I launched 16 instances of the benchmark (using Aurora) on 16 slaves with each 
instance takes 16 cpus (all available cpus on the slave). I configured the 
fixed resource estimator such that instance N has N revocable tasks running 
(each revocable task does while(1) loop burning cpus).

Initially, all revocable containers have cpu.share=1024 and uses SCHED_IDLE as 
the scheduling policy. As you can see in the graph, the interference is 
proportional to the number of revocable tasks for almost all benchmarks.

Later, I changed their cpu.share to be 10. As you can see. setting cpu.share to 
be 10 reduces the interferences a lot. Also, interestingly, the interferences 
is not always proportional to the number of revocable tasks on the slave after 
I changed the cpu.share from 1024 to 10. For some benchmarks, there is no 
interferences (or very little interferences) no matter how many revocable tasks 
are running on the same slave.

> Update Mesos containerizer to understand revocable cpu resources
> ----------------------------------------------------------------
>
>                 Key: MESOS-2652
>                 URL: https://issues.apache.org/jira/browse/MESOS-2652
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Vinod Kone
>            Assignee: Ian Downes
>              Labels: twitter
>             Fix For: 0.23.0
>
>         Attachments: Abnormal performance with 3 additional revocable tasks 
> (1).png, Abnormal performance with 3 additional revocable tasks (2).png, 
> Abnormal performance with 3 additional revocable tasks (3).png, Abnormal 
> performance with 3 additional revocable tasks (4).png, Abnormal performance 
> with 3 additional revocable tasks (5).png, Abnormal performance with 3 
> additional revocable tasks (6).png, Abnormal performance with 3 additional 
> revocable tasks (7).png, Performance improvement after reducing cpu.share to 
> 2 for revocable tasks (1).png, Performance improvement after reducing 
> cpu.share to 2 for revocable tasks (10).png, Performance improvement after 
> reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement 
> after reducing cpu.share to 2 for revocable tasks (3).png, Performance 
> improvement after reducing cpu.share to 2 for revocable tasks (4).png, 
> Performance improvement after reducing cpu.share to 2 for revocable tasks 
> (5).png, Performance improvement after reducing cpu.share to 2 for revocable 
> tasks (6).png, Performance improvement after reducing cpu.share to 2 for 
> revocable tasks (7).png, Performance improvement after reducing cpu.share to 
> 2 for revocable tasks (8).png, Performance improvement after reducing 
> cpu.share to 2 for revocable tasks (9).png, cpu.share from 1024 to 10 for 
> revocable tasks (1).png, cpu.share from 1024 to 10 for revocable tasks (2).png
>
>
> The CPU isolator needs to properly set limits for revocable and non-revocable 
> containers.
> The proposed strategy is to use a two-way split of the cpu cgroup hierarchy 
> -- normal (non-revocable) and low priority (revocable) subtrees -- and to use 
> a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split 
> (TBD). Containers would be present in only one of the subtrees. CFS quotas 
> will *not* be set on subtree roots, only cpu.shares. Each container would set 
> CFS quota and shares as done currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to