Joerg Schad created MESOS-4315:
----------------------------------

             Summary: Improve Quota Failover Logic
                 Key: MESOS-4315
                 URL: https://issues.apache.org/jira/browse/MESOS-4315
             Project: Mesos
          Issue Type: Improvement
            Reporter: Joerg Schad


The Quota failover logic introduced with MESOS-3865 changes the  the master 
failover recovery changes significantly if at least one quota is set. 

Now, if upon recovery any previously set quota have been detected, the 
allocator enters recovery mode, during which the allocator does not issue 
offers. The recovery mode — and therefore offer suspension — ends when either:

* A certain amount of agents reregisters (by default 80% of agents known   
before the failover),
* a timeout expires (by default 10 minutes).

We could also safely exit the recovery mode, once all quota has been satisfied 
(i.e. all agents participating in satisfying quota have reconnected).
For small clusters a large percentage of quota'ed resources this will not make 
too much difference compared to the existing rules. But for larger clusters 
this condition could be fulfilled much faster than the 80% condition. 

We should at least consider whether such condition is worth the added 
complexity.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to