Re: [jira] [Updated] (MESOS-2075) Add maintenance information to the replicated registry.

Benjamin Mahler Thu, 09 Jul 2015 13:04:55 -0700

The easiest thing is to store it within the same key, however, if we don't
require atomic updates across the slaves and the maintenance information,
we can split the keys. Note that we don't have transactions for State.
Also, note that we don't have compaction in the log storage implementation,
that might need to be addressed.


We need to be careful about adding more information to the 'registry' key,
for large clusters it already approaches tens of megabytes. (varies based
on attribute information).

On Wed, Jul 8, 2015 at 1:40 AM, Alex Rukletsov <[email protected]> wrote:

> Guys,
>
> do you plan to persist maintenance in the 'registry' key or create a
> separate one? We should persist quotas as well, and though I don't expect
> quotas to be large, maybe it will help you decide towards creating separate
> keys per entity.
>
>
> On Wed, Jul 8, 2015 at 12:38 AM, Artem Harutyunyan <[email protected]>
> wrote:
>
> > Hi Ben,
> >
> > Yes, we do plan to work on this. Thanks a lot for raising the concern!
> > We will change the first task to update the design doc and revive the
> > discussion around this.
> >
> > Cheers,
> > Artem.
> >
> > On Tue, Jul 7, 2015 at 3:20 PM, Benjamin Mahler
> > <[email protected]> wrote:
> > > Hm.. are you guys planning to start working on this?
> > >
> > > We should revisit the design here:
> > >
> >
> https://docs.google.com/document/d/1CIoOnBLFiEvmhOe-h_s8M4m9Qa7BLETuj_dSNJW959U/edit
> > >
> > > Specifically, after getting feedback from folks working on storage
> > systems,
> > > they seem to really want the safety of explicit acceptance of
> > maintenance.
> > > This complicates things a bit, because previously we simplified this by
> > > relying on a lack of things running to signal implicit acceptance. Once
> > > explicit acceptance is required, we need to be allowing frameworks to
> > > accept maintenance even if they don't have anything running on the
> slave.
> > > Ideally, we don't ask all frameworks about all slaves, for example, by
> > only
> > > asking when they allocation rights (e.g. reservations, quota, etc).
> > >
> > > On Tue, Jul 7, 2015 at 3:03 PM, Artem Harutyunyan (JIRA) <
> > [email protected]>
> > > wrote:
> > >
> > >>
> > >>      [
> > >>
> >
> https://issues.apache.org/jira/browse/MESOS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> > >> ]
> > >>
> > >> Artem Harutyunyan updated MESOS-2075:
> > >> -------------------------------------
> > >>            Sprint: Mesosphere Sprint 14
> > >>            Labels: mesosphere twitter  (was: twitter)
> > >>     Fix Version/s: 0.24.0
> > >>
> > >> > Add maintenance information to the replicated registry.
> > >> > -------------------------------------------------------
> > >> >
> > >> >                 Key: MESOS-2075
> > >> >                 URL:
> https://issues.apache.org/jira/browse/MESOS-2075
> > >> >             Project: Mesos
> > >> >          Issue Type: Task
> > >> >          Components: master
> > >> >            Reporter: Benjamin Mahler
> > >> >              Labels: mesosphere, twitter
> > >> >             Fix For: 0.24.0
> > >> >
> > >> >
> > >> > To achieve fault-tolerance for the maintenance primitives, we will
> > need
> > >> to add the maintenance information to the registry.
> > >> > The registry currently stores all of the slave information, which is
> > >> quite large (~ 17MB for 50,000 slaves from my testing), which results
> > in a
> > >> protobuf object that is extremely expensive to copy.
> > >> > As far as I can tell, reads / writes to maintenance information is
> > >> independent of reads / writes to the existing 'registry' information.
> So
> > >> there are two approach here:
> > >> > h4. Add maintenance information to 'maintenance' key:
> > >> > # The advantage of this approach is that we don't further grow the
> > large
> > >> Registry object.
> > >> > # This approach assumes that writes to 'maintenance' are independent
> > of
> > >> writes to the 'registry'. If these writes are not independent, this
> > >> approach requires that we add transactional support to the State
> > >> abstraction.
> > >> > # This approach requires adding compaction to LogStorage.
> > >> > # This approach likely requires some refactoring to the Registrar.
> > >> > h4. Add maintenance information to 'registry' key:
> > >> > # The advantage of this approach is that it's the easiest to
> > implement.
> > >> > # This will further grow the single 'registry' object, but doesn't
> > >> preclude it being split apart in the future.
> > >> > # This approach may require using the diff support in LogStorage
> > and/or
> > >> adding compression support to LogStorage snapshots to deal with the
> > >> increased size of the registry.
> > >>
> > >>
> > >>
> > >> --
> > >> This message was sent by Atlassian JIRA
> > >> (v6.3.4#6332)
> > >>
> >
>

Re: [jira] [Updated] (MESOS-2075) Add maintenance information to the replicated registry.

Reply via email to