Jacob Janco created MESOS-6904:
----------------------------------
Summary: Track resource allocation candidates and batch allocation
work
Key: MESOS-6904
URL: https://issues.apache.org/jira/browse/MESOS-6904
Project: Mesos
Issue Type: Bug
Components: allocation
Reporter: Jacob Janco
Assignee: Jacob Janco
"Our deployment environments have a lot of churn, with many short-live
frameworks that often revive offers. Running the allocator takes a long time
(from seconds up to minutes).
In this situation, event-triggered allocation causes the event queue in the
allocator process to get very long, and the allocator effectively becomes
unresponsive (eg. a revive offers message takes too long to come to the head of
the queue)." - MESOS-3157
To remedy the above scenario, it is proposed to track allocation candidates and
only dispatch allocation work if there is no pending allocation in the
allocator queue. When an enqueued allocation is processed, the tracked set of
candidates is cleared.
Current behavior will trigger allocation work on cluster events (e.g.
`addSlave()`, `addFramework()`, etc) as well as during the periodic batched
allocation running at a defined time interval.
This ticket tracks the new direction the work has taken since discussion in
MESOS-3157 where a previous solution by [~jamespeach] introduced batched
allocation only (which we currently run) as well as an approach to reduce
redundancy of work in the queue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)