> On Oct. 1, 2013, 8:17 p.m., Chi Zhang wrote:
> > This is an issue bigger than I thought. The resources in the task only gets 
> > accounted when you launch the task. Copying it earlier gets the first 
> > task's resources doubled counted when it is actually started after the 
> > executor is launched. Whether or it a task is started and the resources 
> > associated with it need to be taken account of separately. 
> > 
> > Any thoughts?
> 
> Ben Mahler wrote:
>     Is it that some resource subsystems require non-zero resources when the 
> executor is launched? If the answer is yes, can we have a minimum initial 
> resource allocation (akin to what is done in CgroupsIsolator)? See the 
> following constants:
>     
>     // CPU subsystem constants.
>     const size_t CPU_SHARES_PER_CPU = 1024;
>     const size_t MIN_CPU_SHARES = 10;
>     const Duration CPU_CFS_PERIOD = Milliseconds(100); // Linux default.
>     const Duration MIN_CPU_CFS_QUOTA = Milliseconds(1);
>     
>     // Memory subsystem constants.
>     const Bytes MIN_MEMORY = Megabytes(32);
>     
>     It's not ideal but it may be a simpler solution to your problem.
> 
> Chi Zhang wrote:
>     That would do for now since we aren't adding new features, but part of 
> our goal in the current refactoring work is to allow different combinations 
> of resource isolation modules to be used for different executors. Resource 
> such as a disk partition would require non-zero requirement to initialize. 
> There are also other more 'optional' (than cpu, mem, disk and port) features 
> like namespaces we are also trying to provide a foundation for. namespaces 
> might not take a number to initialize but it affects which system api is used 
> in the launcher when it comes to implementation details. 
>     
>     All these would require some form of pass-down from slave, extracted out 
> from the task message, when an executor is launched. I am still thinking the 
> 'resources' argument for launchExecutor should be the candidate to pass them 
> along.
> 
> Ian Downes wrote:
>     I think what Chi is suggesting is that we'd like the necessary 
> *executor's* resources to be provided when launchExecutor() is called, e.g. 
> 0.25 CPU and 128 MB for the executor, i.e. split out the executor's resources 
> from task resources. Later, as tasks are launched by the executor, the 
> resources can be updated, e.g. 0.25 CPU + 4 CPU and 128 MB + 4 GB for a 4/4 
> task.
>     
>     If this split isn't easily done then we could wrangle something with 
> default values that are subtracted from the actual requests.
>     
>

When frameworks launch a task inside an executor, they are expected to specify 
resources for the executor. We don't check that these resources are non-zero at 
the moment, but we certainly could consider it. This came up previously when 
trying to implement cpuset support (which requires pinning to a non-zero number 
of cores).

Another source of this is the command executor, where we generate a 0 resource 
ExecutorInfo. Also not ideal.

Re your comment: what do you mean by the resources needing to be "subtracted" 
when using defaults? I was thinking more along the lines of the isolator using 
max(default_resource, actual_resource) to compute the necessary resources.

You could imagine the following:
Launch 0 resource executor: resourcesChanged() is called during 
launchExecutor() with 0 resouces -> use defaults in isolator (max(0.1cpus, 
0cpus) = 0.1cpus and max(32MB, 0MB) = 32MB memory).
Launch 1 CPU, 1GB memory task on executor: resourcesChanged() is now called by 
the slave with 0 + task resources -> isolation is applied with max(0.1cpus, 
1cpus) = 1cpus and max(32MB, 1GB) = 1GB memory.
If the task completes: resourcesChanged() called with 0 once again by the 
slave, which applies the defaults (0.1 cpu, 32MB memory).

This is how the current Isolators work in the face of 0 resources :)


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14414/#review26580
-----------------------------------------------------------


On Sept. 30, 2013, 9:12 p.m., Chi Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14414/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2013, 9:12 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Ian Downes, Jie Yu, 
> David Mackey, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
>     slave: Copy resource requirements from the first TaskInfo message to the 
>     ExecutorInfo before an executor is launched.
>                              
>     Otherwise, this leads to a null value passed to launchExecutor for the    
>  
>     resources field. It's necessary for some resource subsystems to 
> initialize 
>     executors with resource requirement upfront.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 0ad4576 
> 
> Diff: https://reviews.apache.org/r/14414/diff/
> 
> 
> Testing
> -------
> 
> Can't tell for sure. With or without the patch, `make -j check` fails at the 
> same place on a Mesos dev box.
> 
> [----------] Global test environment tear-down                 
> [==========] 263 tests from 47 test cases ran. (146351 ms total)              
>  
> [  PASSED  ] 259 tests.                                                       
> [  FAILED  ] 4 tests, listed below:                                           
> [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework                
> [  FAILED  ] SASL.success                                                     
> [  FAILED  ] SASL.failed1                                                     
>  
> [  FAILED  ] SASL.failed2                                                     
>  
>                                                                               
>  
>  4 FAILED TESTS                                                              
> make[3]: *** [check-local] Error 1                                           
> make[3]: Leaving directory `/home/czhang/mesos-apache/build/src'             
> make[2]: *** [check-am] Error 2                                               
> make[2]: Leaving directory `/home/czhang/mesos-apache/build/src'             
> make[1]: *** [check] Error 2                                                 
> make[1]: Leaving directory `/home/czhang/mesos-apache/build/src'             
> make: *** [check-recursive] Error 1                                          
> Connection to smfd-aki-27-sr1.devel.twitter.com closed.
> 
> 
> Thanks,
> 
> Chi Zhang
> 
>

Reply via email to