[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources

Jie Yu (JIRA) Fri, 10 Jul 2015 12:38:48 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622804#comment-14622804
 ]


Jie Yu commented on MESOS-2652:
-------------------------------

Re-open this ticket because we observed that setting processes scheduler policy 
to SCHED_IDLE does not work as expected when cgroups are used.

Here is my testing environment:

(1) I used a widely used open source cpu benchmark for multi-processors, called 
Parsec (http://parsec.cs.princeton.edu/), to test cpu performance. The idea is 
to launch a job (using Aurora) with each instance continuously running Parsec 
benchmark and reporting statistics.

(2) Each instance of the job uses 16 threads (by configuring the Parsec). Each 
instance of the job is scheduled on a box with 16 cores. That means no other 
regular job can land on those boxes.

(3) Use a fixed resource estimator on each slave and launch revocable tasks 
using no_executor_framework. Each revocable task simply does a 'while(true)' 
burning cpus.

There is one interesting observation: one instance of the benchmark job lands 
on a slave that happens to have 11 revocable tasks running (each uses 1 
revocable cpu). All other slaves all have 8 revocable tasks running. And that 
instance of the benchmark job performs consistently worse than other instances. 
However, after I killed the 3 extra revocable tasks, the performance improves 
immediately and matches that of other instances. See the attached result.

To be continued...

> Update Mesos containerizer to understand revocable cpu resources
> ----------------------------------------------------------------
>
>                 Key: MESOS-2652
>                 URL: https://issues.apache.org/jira/browse/MESOS-2652
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Vinod Kone
>            Assignee: Ian Downes
>              Labels: twitter
>             Fix For: 0.23.0
>
>         Attachments: Abnormal performance with 3 additional revocable tasks 
> (1).png, Abnormal performance with 3 additional revocable tasks (2).png, 
> Abnormal performance with 3 additional revocable tasks (3).png, Abnormal 
> performance with 3 additional revocable tasks (4).png, Abnormal performance 
> with 3 additional revocable tasks (5).png, Abnormal performance with 3 
> additional revocable tasks (6).png, Abnormal performance with 3 additional 
> revocable tasks (7).png
>
>
> The CPU isolator needs to properly set limits for revocable and non-revocable 
> containers.
> The proposed strategy is to use a two-way split of the cpu cgroup hierarchy 
> -- normal (non-revocable) and low priority (revocable) subtrees -- and to use 
> a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split 
> (TBD). Containers would be present in only one of the subtrees. CFS quotas 
> will *not* be set on subtree roots, only cpu.shares. Each container would set 
> CFS quota and shares as done currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources

Reply via email to