+1 overall

also +1 to Sandy's suggestion to getting build maintainers as well.

On Wed, Nov 5, 2014 at 7:57 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> This seems like a good idea.
>
> An area that wasn't listed, but that I think could strongly benefit from
> maintainers, is the build.  Having consistent oversight over Maven, SBT,
> and dependencies would allow us to avoid subtle breakages.
>
> Component maintainers have come up several times within the Hadoop project,
> and I think one of the main reasons the proposals have been rejected is
> that, structurally, its effect is to slow down development.  As you
> mention, this is somewhat mitigated if being a maintainer leads committers
> to take on more responsibility, but it might be worthwhile to draw up more
> specific ideas on how to combat this?  E.g. do obvious changes, doc fixes,
> test fixes, etc. always require a maintainer?
>
> -Sandy
>
> On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
> > +1 (binding)
> >
> > On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <matei.zaha...@gmail.com>
> > wrote:
> >
> > > BTW, my own vote is obviously +1 (binding).
> > >
> > > Matei
> > >
> > > > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <matei.zaha...@gmail.com>
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I wanted to share a discussion we've been having on the PMC list, as
> > > well as call for an official vote on it on a public list. Basically, as
> > the
> > > Spark project scales up, we need to define a model to make sure there
> is
> > > still great oversight of key components (in particular internal
> > > architecture and public APIs), and to this end I've proposed
> > implementing a
> > > maintainer model for some of these components, similar to other large
> > > projects.
> > > >
> > > > As background on this, Spark has grown a lot since joining Apache.
> > We've
> > > had over 80 contributors/month for the past 3 months, which I believe
> > makes
> > > us the most active project in contributors/month at Apache, as well as
> > over
> > > 500 patches/month. The codebase has also grown significantly, with new
> > > libraries for SQL, ML, graphs and more.
> > > >
> > > > In this kind of large project, one common way to scale development is
> > to
> > > assign "maintainers" to oversee key components, where each patch to
> that
> > > component needs to get sign-off from at least one of its maintainers.
> > Most
> > > existing large projects do this -- at Apache, some large ones with this
> > > model are CloudStack (the second-most active project overall),
> > Subversion,
> > > and Kafka, and other examples include Linux and Python. This is also
> > > by-and-large how Spark operates today -- most components have a
> de-facto
> > > maintainer.
> > > >
> > > > IMO, adopting this model would have two benefits:
> > > >
> > > > 1) Consistent oversight of design for that component, especially
> > > regarding architecture and API. This process would ensure that the
> > > component's maintainers see all proposed changes and consider them to
> fit
> > > together in a good way.
> > > >
> > > > 2) More structure for new contributors and committers -- in
> particular,
> > > it would be easy to look up who’s responsible for each module and ask
> > them
> > > for reviews, etc, rather than having patches slip between the cracks.
> > > >
> > > > We'd like to start with in a light-weight manner, where the model
> only
> > > applies to certain key components (e.g. scheduler, shuffle) and
> > user-facing
> > > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
> expand
> > > it if we deem it useful. The specific mechanics would be as follows:
> > > >
> > > > - Some components in Spark will have maintainers assigned to them,
> > where
> > > one of the maintainers needs to sign off on each patch to the
> component.
> > > > - Each component with maintainers will have at least 2 maintainers.
> > > > - Maintainers will be assigned from the most active and knowledgeable
> > > committers on that component by the PMC. The PMC can vote to add /
> remove
> > > maintainers, and maintained components, through consensus.
> > > > - Maintainers are expected to be active in responding to patches for
> > > their components, though they do not need to be the main reviewers for
> > them
> > > (e.g. they might just sign off on architecture / API). To prevent
> > inactive
> > > maintainers from blocking the project, if a maintainer isn't responding
> > in
> > > a reasonable time period (say 2 weeks), other committers can merge the
> > > patch, and the PMC will want to discuss adding another maintainer.
> > > >
> > > > If you'd like to see examples for this model, check out the following
> > > projects:
> > > > - CloudStack:
> > >
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > > >
> > > > - Subversion:
> > > https://subversion.apache.org/docs/community-guide/roles.html <
> > > https://subversion.apache.org/docs/community-guide/roles.html>
> > > >
> > > > Finally, I wanted to list our current proposal for initial components
> > > and maintainers. It would be good to get feedback on other components
> we
> > > might add, but please note that personnel discussions (e.g. "I don't
> > think
> > > Matei should maintain *that* component) should only happen on the
> private
> > > list. The initial components were chosen to include all public APIs and
> > the
> > > main core components, and the maintainers were chosen from the most
> > active
> > > contributors to those modules.
> > > >
> > > > - Spark core public API: Matei, Patrick, Reynold
> > > > - Job scheduler: Matei, Kay, Patrick
> > > > - Shuffle and network: Reynold, Aaron, Matei
> > > > - Block manager: Reynold, Aaron
> > > > - YARN: Tom, Andrew Or
> > > > - Python: Josh, Matei
> > > > - MLlib: Xiangrui, Matei
> > > > - SQL: Michael, Reynold
> > > > - Streaming: TD, Matei
> > > > - GraphX: Ankur, Joey, Reynold
> > > >
> > > > I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The
> > > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > > >
> > > > Matei
> > >
> > >
> >
>

Reply via email to