[ 
https://issues.apache.org/jira/browse/MESOS-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3165:
---------------------------------------
    Shepherd: Joris Van Remoortere  (was: Benjamin Hindman)

> Persist and recover quota to/from Registry
> ------------------------------------------
>
>                 Key: MESOS-3165
>                 URL: https://issues.apache.org/jira/browse/MESOS-3165
>             Project: Mesos
>          Issue Type: Task
>          Components: master, replicated log
>            Reporter: Alexander Rukletsov
>            Assignee: Alexander Rukletsov
>              Labels: mesosphere
>
> To persist quotas across failovers, the Master should save them in the 
> registry. To support this, we shall:
> * Introduce a Quota state variable in registry.proto;
> * Extend the Operation interface so that it supports a ‘Quota’ accumulator 
> (see src/master/registrar.hpp);
> * Introduce AddQuota / RemoveQuota operations;
> * Recover quotas from the registry on failover to the Master’s 
> internal::master::Role struct;
> * Extend RegistrarTest with quota-specific tests.
> NOTE: Registry variable can be rather big for production clusters (see 
> MESOS-2075). While it should be fine for MVP to add quota information to 
> registry, we should consider storing Quota separately, as this does not need 
> to be in sync with slaves update. However, currently adding more variable is 
> not supported by the registrar.
> While the Agents are reregistering (note they may fail to do so), the 
> information about what part of the quota is allocated is only partially 
> available to the Master. In other words, the state of the quota allocation is 
> reconstructed as Agents reregister. During this period, some roles may be 
> under quota from the perspective of the newly elected Master.
> The same problem exists on the allocator side: it may think the cluster is 
> under quota and may eagerly try to satisfy quotas before enough Agents 
> reregister, which may result in resources being allocated to frameworks 
> beyond their quota. To address this issue and also to avoid panicking and 
> generating under quota alerts, the Master should give a certain amount of 
> time for the majority (e.g. 80%) of the Agents to reregister before reporting 
> any quota status and notifying the allocator about granted quotas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to