[ 
https://issues.apache.org/jira/browse/MESOS-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109078#comment-14109078
 ] 

ASF GitHub Bot commented on MESOS-1688:
---------------------------------------

GitHub user MartinWeindel opened a pull request:

    https://github.com/apache/mesos/pull/24

    Fix for MESOS-1688

    As already explained in 
[MESOS-1688](https://issues.apache.org/jira/browse/MESOS-1688), there are 
schedulers allocating memory only for the executor and not for tasks. For tasks 
only CPU resources are allocated in this case.
    Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
    This can easily lead to a dead lock (in the application, not in Mesos).
    
    Simple example:
    1. Scheduler allocates all memory of a slave for an executor
    2. Scheduler launches a task for this executor (allocating 1 CPU)
    3. Task finishes: 1 CPU , 0 MB memory allocatable.
    4. No offers are made, as no memory is left. Scheduler will wait of offers 
forever. Dead lock in the application.
    
    To fix this problem, offers must be made if either CPU or memory is 
allocatable.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MartinWeindel/mesos master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/mesos/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #24
    
----
commit 8d816e4582208802d319071e90ef4e1810960718
Author: Martin Weindel <[email protected]>
Date:   2014-08-25T12:30:51Z

    fix for MESOS-1688

----


> No offers if no memory is allocatable
> -------------------------------------
>
>                 Key: MESOS-1688
>                 URL: https://issues.apache.org/jira/browse/MESOS-1688
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.18.1, 0.18.2, 0.19.0, 0.19.1
>            Reporter: Martin Weindel
>            Priority: Critical
>
> The [Spark 
> scheduler|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala]
>  allocates memory only for the executor and cpu only for its tasks.
> So it can happen that all memory is nearly completely allocated by Spark 
> executors, but all cpu resources are idle.
> In this case Mesos does not offer resources anymore, as less than MIN_MEM 
> (=32MB) memory is allocatable.
> This effectively causes a dead lock in the Spark job, as it is not offered 
> cpu resources needed for launching new tasks.
> see {{HierarchicalAllocatorProcess::allocatable(const Resources&)}} called in 
> {{HierarchicalAllocatorProcess::allocate(const hashset<SlaveID>&)}}
> {code}
> template <class RoleSorter, class FrameworkSorter>
> bool
> HierarchicalAllocatorProcess<RoleSorter, FrameworkSorter>::allocatable(
>     const Resources& resources)
> {
> ...
>   Option<double> cpus = resources.cpus();
>   Option<Bytes> mem = resources.mem();
>   if (cpus.isSome() && mem.isSome()) {
>     return cpus.get() >= MIN_CPUS && mem.get() > MIN_MEM;
>   }
>   return false;
> }
> {code}
> A possible solution may to completely drop the condition on allocatable 
> memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to