-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review16667
-----------------------------------------------------------


This looks pretty good, some higher level comments:

In terms of the approach to turn on cfs bandwidth control, I think it's 
important
to think of the longer term goal here. We probably don't want to force all tasks
under a slave to be run with hard cpu isolation. Some tasks, like mapreduce 
jobs,
will want to use as much cpu as possible and are great candidates for soft cpu
isolation. Other tasks, like throughput sensitive tasks, will want the 
determinism
of hard cpu isolation.

I'm convinced if we turned on hard cpu isolation for an entire production 
cluster,
we'd see a lot of problems with under-estimated cpu requirements.

This may mean pushing some optional responsibility to the scheduler. Offering
the ability to request the type of isolation (e.g. SOFT, HARD, PINNED). 
Schedulers
can then surface that complexity to their end-users however they like.

Unfortunately, we don't have the bits in place for that right now, but we should
get to it when we want to release this.

So for now, I think approach (1) is reasonable. But keep the above in mind, and
update any TODOs as appropriate.


src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35372>

    can you put a newline here? that way we have three logical sections of 
constants
    
    Feel free to add a title comment for these two like you did with cfs.
    
    e.g.
    
    // Memory subsystem constants.
    const size_t MIN_MEMORY_MB = 32 * Megabyte;



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35373>

    were these meant to match the variable name or the subsystem control name?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35371>

    Is there any way to pull the period from a linux header? Will the bandwidth 
control contain this value initially, or -1?
    
    Do you see the default ever changing?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35370>

    I think we'll stick with the default period, so you can kill this TODO. If 
we end up needing to mess with the default we can consider that later.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35358>

    Unused, were you planning on making use of this?
    
    In general, I don't see us needing to completely disable quota once we've 
turned it on.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35369>

    Can you update the comment and TODO here to reflect my higher level 
comments?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35367>

    need to also check exists.isError()



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35368>

    s/ERROR/WARNING
    
    The exists.isError case I would indeed consider an error, however.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35366>

    I think these three lines can be distilled down to:
    
    // Apply CFS bandwidth limiting if available.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35360>

    kill extra space before 'exists'



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35362>

    newline and space after the 'if'
    
    you also need to check if exists is an error, otherwise get() will fail



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35363>

    s/cpuCFSQuota/cfsQuota
    
    On a philosophical note: We typically try to get variable names down to 1 
word, as long as the context is enough to make it clear. I think the above 
"cpuShares" could have been sufficient as "shares" which makes this seem more 
appropriate as just "quota"



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35365>

    Let's make this a NOTE and distill it:
    
    // NOTE: cfs_quota_us is dependent upon cfs_period_us,
    // so we have have to write the period first.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35364>

    newline
    
    This isn't a hard convention but I think we agreed at one point it is more 
legible to use newlines between the assignment and the Try/Option/Result 
checking.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35361>

    newline



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35359>

    Can you merge the previous cfs LOG(INFO) into this one? That way, we can 
see the updated controls for the executor in a single log line.


- Ben Mahler


On Feb. 15, 2013, 6:42 a.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated Feb. 15, 2013, 6:42 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a 
> "subfeature" of an already supported cgroups subsystem, cpu. Also, there are 
> two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg 
> "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a 
> cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups 
> subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.cpp 14f549e 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>

Reply via email to