[ 
https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-3157:
-----------------------------------
    Shepherd: Benjamin Mahler

Having mulled over this patch, and the threads related to this, it seems like 
the issue here is that we perform an unnecessary number of full allocations (1 
allocation : 1 event), whereas ideally we perform batching (M allocations : N 
events, where M <= N). 

For example, when N calls to reviveOffers are enqueued behind an allocation, 
we'll do the following:

{noformat}
allocate
reviveOffers -> allocate
reviveOffers -> allocate
reviveOffers -> allocate
reviveOffers -> allocate
{noformat}

When ideally we could do the following:

{noformat}
allocate
reviveOffers
reviveOffers
reviveOffers
reviveOffers
allocate
{noformat}

The idea here is to ensure that allocation work that arrives while we were 
doing an allocation (in this case 3 reviveOffers) will be "batched" into a 
single allocation round. This technique is used in the registrar 
(registrar.cpp) in order to avoid the performance issues from excessive 
queueing that occur when operations are done serially without "batching".

Here's how I would suggest proceeding:

(1) Add an allocator benchmark for a large number of reviveOffer requests when 
there are many slaves and frameworks, which includes the time taken for the 
implied allocations to occur.

(2) Implement batching of allocations, this will entail keeping a running set 
of SlaveIDs which require an allocation. Also, rather than immediately 
allocating during an event, we defer the allocation so that it will occur 
*after* all currently enqueued events. When the deferred allocation occurs, we 
clear the running set of SlaveIDs. Note that if an interval-based allocation 
occurs before the deferred allocation, it will also clear the running set, 
which is correct.

(3) This should avoid the need for eliminating the event-driven allocation code 
as per the original intent of this patch, since we've bounded the amount of 
allocations that can be queued.

[~jamespeach] sorry for the runaround! From what I've gathered from the emails 
and this ticket, this should be sufficient for keeping event-driven allocation 
without backing up the allocator in the case of expensive allocation. At the 
same time as this, we should invest effort in improving the performance of the 
allocation loop.

> only perform batch resource allocations
> ---------------------------------------
>
>                 Key: MESOS-3157
>                 URL: https://issues.apache.org/jira/browse/MESOS-3157
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: James Peach
>            Assignee: James Peach
>
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> We have been running a patch to remove all the event-triggered allocations 
> and only allocate from the batch task 
> {{HierarchicalAllocatorProcess::batch}}. This works great and really improves 
> responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to