[
https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630351#comment-14630351
]
Jie Yu commented on MESOS-2652:
-------------------------------
I did a few more experiments using the Parsec based CPU benchmark to further
quantify the interferences from revocable tasks.
I launched 16 instances of the benchmark (using Aurora) on 16 slaves with each
instance takes 16 cpus (all available cpus on the slave). I configured the
fixed resource estimator such that instance N has N revocable tasks running
(each revocable task does while(1) loop burning cpus).
Initially, all revocable containers have cpu.share=1024 and uses SCHED_IDLE as
the scheduling policy. As you can see in the graph, the interference is
proportional to the number of revocable tasks for almost all benchmarks.
Later, I changed their cpu.share to be 10. As you can see. setting cpu.share to
be 10 reduces the interferences a lot. Also, interestingly, the interferences
is not always proportional to the number of revocable tasks on the slave after
I changed the cpu.share from 1024 to 10. For some benchmarks, there is no
interferences (or very little interferences) no matter how many revocable tasks
are running on the same slave.
> Update Mesos containerizer to understand revocable cpu resources
> ----------------------------------------------------------------
>
> Key: MESOS-2652
> URL: https://issues.apache.org/jira/browse/MESOS-2652
> Project: Mesos
> Issue Type: Task
> Reporter: Vinod Kone
> Assignee: Ian Downes
> Labels: twitter
> Fix For: 0.23.0
>
> Attachments: Abnormal performance with 3 additional revocable tasks
> (1).png, Abnormal performance with 3 additional revocable tasks (2).png,
> Abnormal performance with 3 additional revocable tasks (3).png, Abnormal
> performance with 3 additional revocable tasks (4).png, Abnormal performance
> with 3 additional revocable tasks (5).png, Abnormal performance with 3
> additional revocable tasks (6).png, Abnormal performance with 3 additional
> revocable tasks (7).png, Performance improvement after reducing cpu.share to
> 2 for revocable tasks (1).png, Performance improvement after reducing
> cpu.share to 2 for revocable tasks (10).png, Performance improvement after
> reducing cpu.share to 2 for revocable tasks (2).png, Performance improvement
> after reducing cpu.share to 2 for revocable tasks (3).png, Performance
> improvement after reducing cpu.share to 2 for revocable tasks (4).png,
> Performance improvement after reducing cpu.share to 2 for revocable tasks
> (5).png, Performance improvement after reducing cpu.share to 2 for revocable
> tasks (6).png, Performance improvement after reducing cpu.share to 2 for
> revocable tasks (7).png, Performance improvement after reducing cpu.share to
> 2 for revocable tasks (8).png, Performance improvement after reducing
> cpu.share to 2 for revocable tasks (9).png, cpu.share from 1024 to 10 for
> revocable tasks (1).png, cpu.share from 1024 to 10 for revocable tasks (2).png
>
>
> The CPU isolator needs to properly set limits for revocable and non-revocable
> containers.
> The proposed strategy is to use a two-way split of the cpu cgroup hierarchy
> -- normal (non-revocable) and low priority (revocable) subtrees -- and to use
> a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split
> (TBD). Containers would be present in only one of the subtrees. CFS quotas
> will *not* be set on subtree roots, only cpu.shares. Each container would set
> CFS quota and shares as done currently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)