> On Sept. 15, 2014, 3:23 p.m., Timothy St. Clair wrote: > > src/master/hierarchical_allocator_process.hpp, line 837 > > <https://reviews.apache.org/r/25035/diff/7/?file=688721#file688721line837> > > > > What happens in the case where all CPUs are taken but memory is > > available? It looks like it will return (true), but this should not be > > possible. > > > > I think you want to give an offer in the case where there are CPU > > resources available, but memory is consumed by the executor. > > Vinod Kone wrote: > Giving memory only resources is ok as long as it is used for a task and > not an executor. See my comments above.
Could you please add a detailed comment in the code above the mod, as on 1st inspection it leaves me still feeling unsettled. - Timothy ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25035/#review53343 ----------------------------------------------------------- On Sept. 16, 2014, 9:05 p.m., Martin Weindel wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25035/ > ----------------------------------------------------------- > > (Updated Sept. 16, 2014, 9:05 p.m.) > > > Review request for mesos and Vinod Kone. > > > Bugs: MESOS-1688 > https://issues.apache.org/jira/browse/MESOS-1688 > > > Repository: mesos-git > > > Description > ------- > > As already explained in JIRA MESOS-1688, there are schedulers allocating > memory only for the executor and not for tasks. For tasks only CPU resources > are allocated in this case. > Such a scheduler does not get offered any idle CPUs if the slave has nearly > used up all memory. > This can easily lead to a dead lock (in the application, not in Mesos). > > Simple example: > 1. Scheduler allocates all memory of a slave for an executor > 2. Scheduler launches a task for this executor (allocating 1 CPU) > 3. Task finishes: 1 CPU , 0 MB memory allocatable. > 4. No offers are made, as no memory is left. Scheduler will wait for offers > forever. Dead lock in the application. > > To fix this problem, offers must be made if CPU resources are allocatable > without considering allocatable memory > > > Diffs > ----- > > CHANGELOG a822cc4 > src/common/resources.cpp edf36b1 > src/master/constants.cpp faa1503 > src/master/hierarchical_allocator_process.hpp 34f8cd6 > src/master/master.cpp 18464ba > src/tests/allocator_tests.cpp 774528a > > Diff: https://reviews.apache.org/r/25035/diff/ > > > Testing > ------- > > Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested > running multiple parallel Spark jobs in "fine-grained" mode to saturate > allocatable memory. The jobs run fine now. This load always caused a dead > lock in all Spark jobs within one minute with the unpatched Mesos. > > > Thanks, > > Martin Weindel > >