[jira] [Commented] (MESOS-3157) only perform batch resource allocations

Benjamin Mahler (JIRA) Fri, 04 Sep 2015 17:14:56 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731647#comment-14731647
 ]


Benjamin Mahler commented on MESOS-3157:
----------------------------------------

{quote}
 The problem with this is knowing when to trigger the location pass. You want 
to trigger it one you have more that a few slaveID's ready, but before the 
batch allocation kicks in. You also want to wait as long as possible so that 
you can batch as many as possible. This seems tricky; I can't think of a way to 
know that no more addSlave or updateSlave events are going to come.
{quote}

This should not be complicated, it's just a matter of doing a deferred 
allocation (via {{defer}}) as I mentioned in (2) above. This ensures that the 
allocation occurs after all currently enqueued events. When any subsequent 
deferred allocations occur, they don't have to do any "work" since the set of 
slaves that require allocation get cleared (as I mentioned in (2)). We could 
track the outstanding allocation explicitly, but we already have to deal with 
the batch allocation deferral so not sure if there's any value in that.

> only perform batch resource allocations
> ---------------------------------------
>
>                 Key: MESOS-3157
>                 URL: https://issues.apache.org/jira/browse/MESOS-3157
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: James Peach
>            Assignee: James Peach
>
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> We have been running a patch to remove all the event-triggered allocations 
> and only allocate from the batch task 
> {{HierarchicalAllocatorProcess::batch}}. This works great and really improves 
> responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3157) only perform batch resource allocations

Reply via email to