[ 
https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068653#comment-15068653
 ] 

James Peach commented on MESOS-3157:
------------------------------------

https://reviews.apache.org/r/41658/

Posted example changes, though not proposing these to be merged at this time. 
The {{queuedAllocations}} count is quite ugly and I'd like to remove this. This 
change accumulates the set of slaves to be allocated from, and attempts to 
defer the actual allocation pass if we can know there is another pass queued 
later in order to maximize the number of slaves visited in each pass.. Batch 
allocations are always executed.

Consider the following message sequence:

{code}
    reviveOffers
    reviveOffers
    allocate
    removeQuota
    allocate
    allocate*
{code}

Only the {{allocate*}} would actually be executed. Note that in practice this 
is very sensitive to the order in which events are queued. The following 
sequence would still allocate every time, because if later the {{allocate}} 
methods are not queued before the earlier ones execute: 
{code}
    reviveOffers
    allocate*
    reviveOffers
    allocate*
{code}


> only perform batch resource allocations
> ---------------------------------------
>
>                 Key: MESOS-3157
>                 URL: https://issues.apache.org/jira/browse/MESOS-3157
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: James Peach
>            Assignee: James Peach
>
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> We have been running a patch to remove all the event-triggered allocations 
> and only allocate from the batch task 
> {{HierarchicalAllocatorProcess::batch}}. This works great and really improves 
> responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to