Guys,

do you plan to persist maintenance in the 'registry' key or create a
separate one? We should persist quotas as well, and though I don't expect
quotas to be large, maybe it will help you decide towards creating separate
keys per entity.


On Wed, Jul 8, 2015 at 12:38 AM, Artem Harutyunyan <[email protected]>
wrote:

> Hi Ben,
>
> Yes, we do plan to work on this. Thanks a lot for raising the concern!
> We will change the first task to update the design doc and revive the
> discussion around this.
>
> Cheers,
> Artem.
>
> On Tue, Jul 7, 2015 at 3:20 PM, Benjamin Mahler
> <[email protected]> wrote:
> > Hm.. are you guys planning to start working on this?
> >
> > We should revisit the design here:
> >
> https://docs.google.com/document/d/1CIoOnBLFiEvmhOe-h_s8M4m9Qa7BLETuj_dSNJW959U/edit
> >
> > Specifically, after getting feedback from folks working on storage
> systems,
> > they seem to really want the safety of explicit acceptance of
> maintenance.
> > This complicates things a bit, because previously we simplified this by
> > relying on a lack of things running to signal implicit acceptance. Once
> > explicit acceptance is required, we need to be allowing frameworks to
> > accept maintenance even if they don't have anything running on the slave.
> > Ideally, we don't ask all frameworks about all slaves, for example, by
> only
> > asking when they allocation rights (e.g. reservations, quota, etc).
> >
> > On Tue, Jul 7, 2015 at 3:03 PM, Artem Harutyunyan (JIRA) <
> [email protected]>
> > wrote:
> >
> >>
> >>      [
> >>
> https://issues.apache.org/jira/browse/MESOS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >> ]
> >>
> >> Artem Harutyunyan updated MESOS-2075:
> >> -------------------------------------
> >>            Sprint: Mesosphere Sprint 14
> >>            Labels: mesosphere twitter  (was: twitter)
> >>     Fix Version/s: 0.24.0
> >>
> >> > Add maintenance information to the replicated registry.
> >> > -------------------------------------------------------
> >> >
> >> >                 Key: MESOS-2075
> >> >                 URL: https://issues.apache.org/jira/browse/MESOS-2075
> >> >             Project: Mesos
> >> >          Issue Type: Task
> >> >          Components: master
> >> >            Reporter: Benjamin Mahler
> >> >              Labels: mesosphere, twitter
> >> >             Fix For: 0.24.0
> >> >
> >> >
> >> > To achieve fault-tolerance for the maintenance primitives, we will
> need
> >> to add the maintenance information to the registry.
> >> > The registry currently stores all of the slave information, which is
> >> quite large (~ 17MB for 50,000 slaves from my testing), which results
> in a
> >> protobuf object that is extremely expensive to copy.
> >> > As far as I can tell, reads / writes to maintenance information is
> >> independent of reads / writes to the existing 'registry' information. So
> >> there are two approach here:
> >> > h4. Add maintenance information to 'maintenance' key:
> >> > # The advantage of this approach is that we don't further grow the
> large
> >> Registry object.
> >> > # This approach assumes that writes to 'maintenance' are independent
> of
> >> writes to the 'registry'. If these writes are not independent, this
> >> approach requires that we add transactional support to the State
> >> abstraction.
> >> > # This approach requires adding compaction to LogStorage.
> >> > # This approach likely requires some refactoring to the Registrar.
> >> > h4. Add maintenance information to 'registry' key:
> >> > # The advantage of this approach is that it's the easiest to
> implement.
> >> > # This will further grow the single 'registry' object, but doesn't
> >> preclude it being split apart in the future.
> >> > # This approach may require using the diff support in LogStorage
> and/or
> >> adding compression support to LogStorage snapshots to deal with the
> >> increased size of the registry.
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.3.4#6332)
> >>
>

Reply via email to