[
https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622804#comment-14622804
]
Jie Yu commented on MESOS-2652:
-------------------------------
Re-open this ticket because we observed that setting processes scheduler policy
to SCHED_IDLE does not work as expected when cgroups are used.
Here is my testing environment:
(1) I used a widely used open source cpu benchmark for multi-processors, called
Parsec (http://parsec.cs.princeton.edu/), to test cpu performance. The idea is
to launch a job (using Aurora) with each instance continuously running Parsec
benchmark and reporting statistics.
(2) Each instance of the job uses 16 threads (by configuring the Parsec). Each
instance of the job is scheduled on a box with 16 cores. That means no other
regular job can land on those boxes.
(3) Use a fixed resource estimator on each slave and launch revocable tasks
using no_executor_framework. Each revocable task simply does a 'while(true)'
burning cpus.
There is one interesting observation: one instance of the benchmark job lands
on a slave that happens to have 11 revocable tasks running (each uses 1
revocable cpu). All other slaves all have 8 revocable tasks running. And that
instance of the benchmark job performs consistently worse than other instances.
However, after I killed the 3 extra revocable tasks, the performance improves
immediately and matches that of other instances. See the attached result.
To be continued...
> Update Mesos containerizer to understand revocable cpu resources
> ----------------------------------------------------------------
>
> Key: MESOS-2652
> URL: https://issues.apache.org/jira/browse/MESOS-2652
> Project: Mesos
> Issue Type: Task
> Reporter: Vinod Kone
> Assignee: Ian Downes
> Labels: twitter
> Fix For: 0.23.0
>
> Attachments: Abnormal performance with 3 additional revocable tasks
> (1).png, Abnormal performance with 3 additional revocable tasks (2).png,
> Abnormal performance with 3 additional revocable tasks (3).png, Abnormal
> performance with 3 additional revocable tasks (4).png, Abnormal performance
> with 3 additional revocable tasks (5).png, Abnormal performance with 3
> additional revocable tasks (6).png, Abnormal performance with 3 additional
> revocable tasks (7).png
>
>
> The CPU isolator needs to properly set limits for revocable and non-revocable
> containers.
> The proposed strategy is to use a two-way split of the cpu cgroup hierarchy
> -- normal (non-revocable) and low priority (revocable) subtrees -- and to use
> a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split
> (TBD). Containers would be present in only one of the subtrees. CFS quotas
> will *not* be set on subtree roots, only cpu.shares. Each container would set
> CFS quota and shares as done currently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)