[
https://issues.apache.org/jira/browse/MESOS-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Rukletsov updated MESOS-4315:
---------------------------------------
Summary: Improve quota failover logic. (was: Improve Quota Failover Logic)
> Improve quota failover logic.
> -----------------------------
>
> Key: MESOS-4315
> URL: https://issues.apache.org/jira/browse/MESOS-4315
> Project: Mesos
> Issue Type: Improvement
> Reporter: Joerg Schad
>
> The Quota failover logic introduced with MESOS-3865 changes the master
> failover recovery significantly if at least one quota is set.
> Now, if upon recovery any previously set quota has been detected, the
> allocator enters recovery mode, during which the allocator does not issue
> offers. The recovery mode — and therefore offer suspension — ends when either:
> * a certain amount of agents reregisters (by default 80% of agents known
> before the failover),
> * a timeout expires (by default 10 minutes).
> We could also safely exit the recovery mode, once all quotas have been
> satisfied (i.e. all agents participating in satisfying quota have
> reconnected). For small clusters a large percentage of quota'ed resources
> this will not make too much difference compared to the existing rules. But
> for larger clusters this condition could be fulfilled much faster than the
> 80% condition.
> We should at least consider whether such condition is worth the added
> complexity.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)