Guys, do you plan to persist maintenance in the 'registry' key or create a separate one? We should persist quotas as well, and though I don't expect quotas to be large, maybe it will help you decide towards creating separate keys per entity.
On Wed, Jul 8, 2015 at 12:38 AM, Artem Harutyunyan <[email protected]> wrote: > Hi Ben, > > Yes, we do plan to work on this. Thanks a lot for raising the concern! > We will change the first task to update the design doc and revive the > discussion around this. > > Cheers, > Artem. > > On Tue, Jul 7, 2015 at 3:20 PM, Benjamin Mahler > <[email protected]> wrote: > > Hm.. are you guys planning to start working on this? > > > > We should revisit the design here: > > > https://docs.google.com/document/d/1CIoOnBLFiEvmhOe-h_s8M4m9Qa7BLETuj_dSNJW959U/edit > > > > Specifically, after getting feedback from folks working on storage > systems, > > they seem to really want the safety of explicit acceptance of > maintenance. > > This complicates things a bit, because previously we simplified this by > > relying on a lack of things running to signal implicit acceptance. Once > > explicit acceptance is required, we need to be allowing frameworks to > > accept maintenance even if they don't have anything running on the slave. > > Ideally, we don't ask all frameworks about all slaves, for example, by > only > > asking when they allocation rights (e.g. reservations, quota, etc). > > > > On Tue, Jul 7, 2015 at 3:03 PM, Artem Harutyunyan (JIRA) < > [email protected]> > > wrote: > > > >> > >> [ > >> > https://issues.apache.org/jira/browse/MESOS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > >> ] > >> > >> Artem Harutyunyan updated MESOS-2075: > >> ------------------------------------- > >> Sprint: Mesosphere Sprint 14 > >> Labels: mesosphere twitter (was: twitter) > >> Fix Version/s: 0.24.0 > >> > >> > Add maintenance information to the replicated registry. > >> > ------------------------------------------------------- > >> > > >> > Key: MESOS-2075 > >> > URL: https://issues.apache.org/jira/browse/MESOS-2075 > >> > Project: Mesos > >> > Issue Type: Task > >> > Components: master > >> > Reporter: Benjamin Mahler > >> > Labels: mesosphere, twitter > >> > Fix For: 0.24.0 > >> > > >> > > >> > To achieve fault-tolerance for the maintenance primitives, we will > need > >> to add the maintenance information to the registry. > >> > The registry currently stores all of the slave information, which is > >> quite large (~ 17MB for 50,000 slaves from my testing), which results > in a > >> protobuf object that is extremely expensive to copy. > >> > As far as I can tell, reads / writes to maintenance information is > >> independent of reads / writes to the existing 'registry' information. So > >> there are two approach here: > >> > h4. Add maintenance information to 'maintenance' key: > >> > # The advantage of this approach is that we don't further grow the > large > >> Registry object. > >> > # This approach assumes that writes to 'maintenance' are independent > of > >> writes to the 'registry'. If these writes are not independent, this > >> approach requires that we add transactional support to the State > >> abstraction. > >> > # This approach requires adding compaction to LogStorage. > >> > # This approach likely requires some refactoring to the Registrar. > >> > h4. Add maintenance information to 'registry' key: > >> > # The advantage of this approach is that it's the easiest to > implement. > >> > # This will further grow the single 'registry' object, but doesn't > >> preclude it being split apart in the future. > >> > # This approach may require using the diff support in LogStorage > and/or > >> adding compression support to LogStorage snapshots to deal with the > >> increased size of the registry. > >> > >> > >> > >> -- > >> This message was sent by Atlassian JIRA > >> (v6.3.4#6332) > >> >
