Re: Review Request 25035: Fix for MESOS-1688

Martin Weindel Tue, 16 Sep 2014 14:05:42 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
-----------------------------------------------------------


(Updated Sept. 16, 2014, 9:05 nachm.)


Review request for mesos and Vinod Kone.


Changes
-------

Adjusted CHANGELOG and comments for integration with 0.21.0 instead 0.20.1.
Improved warning log.


Bugs: MESOS-1688
    https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
-------

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-----

  CHANGELOG a822cc4 
  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
-------

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in "fine-grained" mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel

Re: Review Request 25035: Fix for MESOS-1688

Reply via email to