> On Oct. 1, 2013, 8:17 p.m., Chi Zhang wrote: > > This is an issue bigger than I thought. The resources in the task only gets > > accounted when you launch the task. Copying it earlier gets the first > > task's resources doubled counted when it is actually started after the > > executor is launched. Whether or it a task is started and the resources > > associated with it need to be taken account of separately. > > > > Any thoughts? > > Ben Mahler wrote: > Is it that some resource subsystems require non-zero resources when the > executor is launched? If the answer is yes, can we have a minimum initial > resource allocation (akin to what is done in CgroupsIsolator)? See the > following constants: > > // CPU subsystem constants. > const size_t CPU_SHARES_PER_CPU = 1024; > const size_t MIN_CPU_SHARES = 10; > const Duration CPU_CFS_PERIOD = Milliseconds(100); // Linux default. > const Duration MIN_CPU_CFS_QUOTA = Milliseconds(1); > > // Memory subsystem constants. > const Bytes MIN_MEMORY = Megabytes(32); > > It's not ideal but it may be a simpler solution to your problem. > > Chi Zhang wrote: > That would do for now since we aren't adding new features, but part of > our goal in the current refactoring work is to allow different combinations > of resource isolation modules to be used for different executors. Resource > such as a disk partition would require non-zero requirement to initialize. > There are also other more 'optional' (than cpu, mem, disk and port) features > like namespaces we are also trying to provide a foundation for. namespaces > might not take a number to initialize but it affects which system api is used > in the launcher when it comes to implementation details. > > All these would require some form of pass-down from slave, extracted out > from the task message, when an executor is launched. I am still thinking the > 'resources' argument for launchExecutor should be the candidate to pass them > along.
I think what Chi is suggesting is that we'd like the necessary *executor's* resources to be provided when launchExecutor() is called, e.g. 0.25 CPU and 128 MB for the executor, i.e. split out the executor's resources from task resources. Later, as tasks are launched by the executor, the resources can be updated, e.g. 0.25 CPU + 4 CPU and 128 MB + 4 GB for a 4/4 task. If this split isn't easily done then we could wrangle something with default values that are subtracted from the actual requests. - Ian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14414/#review26580 ----------------------------------------------------------- On Sept. 30, 2013, 9:12 p.m., Chi Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/14414/ > ----------------------------------------------------------- > > (Updated Sept. 30, 2013, 9:12 p.m.) > > > Review request for mesos, Benjamin Hindman, Ben Mahler, Ian Downes, Jie Yu, > David Mackey, Vinod Kone, and Jiang Yan Xu. > > > Repository: mesos-git > > > Description > ------- > > slave: Copy resource requirements from the first TaskInfo message to the > ExecutorInfo before an executor is launched. > > Otherwise, this leads to a null value passed to launchExecutor for the > > resources field. It's necessary for some resource subsystems to > initialize > executors with resource requirement upfront. > > > Diffs > ----- > > src/slave/slave.cpp 0ad4576 > > Diff: https://reviews.apache.org/r/14414/diff/ > > > Testing > ------- > > Can't tell for sure. With or without the patch, `make -j check` fails at the > same place on a Mesos dev box. > > [----------] Global test environment tear-down > [==========] 263 tests from 47 test cases ran. (146351 ms total) > > [ PASSED ] 259 tests. > [ FAILED ] 4 tests, listed below: > [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework > [ FAILED ] SASL.success > [ FAILED ] SASL.failed1 > > [ FAILED ] SASL.failed2 > > > > 4 FAILED TESTS > make[3]: *** [check-local] Error 1 > make[3]: Leaving directory `/home/czhang/mesos-apache/build/src' > make[2]: *** [check-am] Error 2 > make[2]: Leaving directory `/home/czhang/mesos-apache/build/src' > make[1]: *** [check] Error 2 > make[1]: Leaving directory `/home/czhang/mesos-apache/build/src' > make: *** [check-recursive] Error 1 > Connection to smfd-aki-27-sr1.devel.twitter.com closed. > > > Thanks, > > Chi Zhang > >
