[ 
https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6904:
-----------------------------------
    Description: 
Per MESOS-3157:

{quote}
Our deployment environments have a lot of churn, with many short-live 
frameworks that often revive offers. Running the allocator takes a long time 
(from seconds up to minutes).

In this situation, event-triggered allocation causes the event queue in the 
allocator process to get very long, and the allocator effectively becomes 
unresponsive (eg. a revive offers message takes too long to come to the head of 
the queue).
{quote}

To remedy the above scenario, it is proposed to perform batching of the 
enqueued allocation operations so that a single allocation operation can 
satisfy N enqueued allocations. This should reduce the potential for 
backlogging in the allocator. See the discussion 
[here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
 in MESOS-3157.

  was:
"Our deployment environments have a lot of churn, with many short-live 
frameworks that often revive offers. Running the allocator takes a long time 
(from seconds up to minutes).
In this situation, event-triggered allocation causes the event queue in the 
allocator process to get very long, and the allocator effectively becomes 
unresponsive (eg. a revive offers message takes too long to come to the head of 
the queue)." - MESOS-3157 

To remedy the above scenario, it is proposed to track allocation candidates and 
only dispatch allocation work if there is no pending allocation in the 
allocator queue. When an enqueued allocation is processed, the tracked set of 
candidates is cleared. 

Current behavior will trigger allocation work on cluster events (e.g. 
`addSlave()`, `addFramework()`, etc) as well as during the periodic batched 
allocation running at a defined time interval. 

This ticket tracks the new direction the work has taken since discussion in 
MESOS-3157 where a previous solution by [~jamespeach] introduced batched 
allocation only (which we currently run) as well as an approach to reduce 
redundancy of work in the queue. 

        Summary: Perform batching of allocations to reduce allocator queue 
backlogging.  (was: Track resource allocation candidates and batch allocation 
work)

> Perform batching of allocations to reduce allocator queue backlogging.
> ----------------------------------------------------------------------
>
>                 Key: MESOS-6904
>                 URL: https://issues.apache.org/jira/browse/MESOS-6904
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: Jacob Janco
>            Assignee: Jacob Janco
>              Labels: allocator
>
> Per MESOS-3157:
> {quote}
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> {quote}
> To remedy the above scenario, it is proposed to perform batching of the 
> enqueued allocation operations so that a single allocation operation can 
> satisfy N enqueued allocations. This should reduce the potential for 
> backlogging in the allocator. See the discussion 
> [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
>  in MESOS-3157.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to