Joerg Schad created MESOS-4315:
----------------------------------
Summary: Improve Quota Failover Logic
Key: MESOS-4315
URL: https://issues.apache.org/jira/browse/MESOS-4315
Project: Mesos
Issue Type: Improvement
Reporter: Joerg Schad
The Quota failover logic introduced with MESOS-3865 changes the the master
failover recovery changes significantly if at least one quota is set.
Now, if upon recovery any previously set quota have been detected, the
allocator enters recovery mode, during which the allocator does not issue
offers. The recovery mode — and therefore offer suspension — ends when either:
* A certain amount of agents reregisters (by default 80% of agents known
before the failover),
* a timeout expires (by default 10 minutes).
We could also safely exit the recovery mode, once all quota has been satisfied
(i.e. all agents participating in satisfying quota have reconnected).
For small clusters a large percentage of quota'ed resources this will not make
too much difference compared to the existing rules. But for larger clusters
this condition could be fulfilled much faster than the 80% condition.
We should at least consider whether such condition is worth the added
complexity.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)