For what it's worth, this is very close to how HBase attempts to manage the
community load. We break out components (in Jira), with a list of named
component maintainers. Actually, having components alone has given a Big
Bang for the buck because when properly labeled, it makes it really easy
for part-timers to channel their efforts with precision.

As a flink user, I'm +1 for this proposal as well :)

On Thursday, May 12, 2016, Aljoscha Krettek <aljos...@apache.org> wrote:

> +1
>
> The ideas seem good and the proposed number of components seems reasonable.
> With this, we should also then cleanup the JIRA to make it actually usable.
>
> On Thu, 12 May 2016 at 18:09 Stephan Ewen <se...@apache.org <javascript:;>>
> wrote:
>
> > All maintainer candidates are only proposals so far. No indication of
> lead
> > or anything so far.
> >
> > Let's first see if we agree on the structure proposed here, and if we
> take
> > the components as suggested here or if we refine the list.
> > Am 12.05.2016 17:45 schrieb "Robert Metzger" <rmetz...@apache.org
> <javascript:;>>:
> >
> > > tl;dr: +1
> > >
> > > I also like the proposal a lot. Our community is growing at a quite
> fast
> > > pace and we need to have some structure in place to still keep track of
> > > everything going on.
> > >
> > > I'm happy to see that the proposal mentions cleaning up our JIRA. This
> is
> > > something that has been annoying me for quite a while, but its too big
> to
> > > do it alone. If maintainers could take care of their components, we
> > should
> > > have covered already a lot there.
> > >
> > > One question regarding the "chair" or "lead" role for components: Is
> the
> > > first name in the list of maintainers the lead?
> > >
> > > I would actually suggest to wait until all proposed maintainers agreed
> to
> > > the proposal. It doesn't make sense to make somebody a maintainer of
> > > something if they disagree or are not aware of it.
> > >
> > >
> > >
> > >
> > > On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <m...@apache.org
> <javascript:;>>
> > > wrote:
> > >
> > > > +1 for the initiative. With a better process we will improve the
> > > > quality of the Flink development and give us more time to focus.
> > > >
> > > > Could we have another category "Infrastructure"? This would concern
> > > > things like CI, nightly deployment of snapshots/documentation, ASF
> > > > Infra communication. Robert and me could be the initial maintainers
> > > > for that.
> > > >
> > > > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <se...@apache.org
> <javascript:;>>
> > wrote:
> > > > > Yes, Matthias, that was supposed to be you.
> > > > > Sorry from another guy who frequently has his name misspelled ;-)
> > > > >
> > > > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <mj...@apache.org
> <javascript:;>>
> > > > wrote:
> > > > >
> > > > >> +1 from my side.
> > > > >>
> > > > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I
> guess
> > > > >> it's me, even the correct spelling would be with two 't' :P)
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > > > >> > +1 for the proposal
> > > > >> > On May 12, 2016 12:13 PM, "Stephan Ewen" <se...@apache.org
> <javascript:;>>
> > wrote:
> > > > >> >
> > > > >> >> Yes, Gabor Gevay, that did refer to you!
> > > > >> >>
> > > > >> >> Sorry for the ambiguity...
> > > > >> >>
> > > > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> > > > >> balassi.mar...@gmail.com <javascript:;>
> > > > >> >>>
> > > > >> >> wrote:
> > > > >> >>
> > > > >> >>> +1 for the proposal
> > > > >> >>> @ggevay: I do think that it refers to you. :)
> > > > >> >>>
> > > > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <
> gga...@gmail.com <javascript:;>
> > >
> > > > >> wrote:
> > > > >> >>>
> > > > >> >>>> Hello,
> > > > >> >>>>
> > > > >> >>>> There are at least three Gábors in the Flink community,  :)
> so
> > > > >> >>>> assuming that the Gábor in the list of maintainers of the
> > DataSet
> > > > API
> > > > >> >>>> is referring to me, I'll be happy to do it. :)
> > > > >> >>>>
> > > > >> >>>> Best,
> > > > >> >>>> Gábor G.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <se...@apache.org
> <javascript:;>>:
> > > > >> >>>>> Hi everyone!
> > > > >> >>>>>
> > > > >> >>>>> We propose to establish some lightweight structures in the
> > Flink
> > > > open
> > > > >> >>>>> source community and development process,
> > > > >> >>>>> to help us better handle the increased interest in Flink
> > > (mailing
> > > > >> >> list
> > > > >> >>>> and
> > > > >> >>>>> pull requests), while not overwhelming the
> > > > >> >>>>> committers, and giving users and contributors a good
> > experience.
> > > > >> >>>>>
> > > > >> >>>>> This proposal is triggered by the observation that we are
> > > reaching
> > > > >> >> the
> > > > >> >>>>> limits of where the current community can support
> > > > >> >>>>> users and guide new contributors. The below proposal is
> based
> > on
> > > > >> >>>>> observations and ideas from Till, Robert, and me.
> > > > >> >>>>>
> > > > >> >>>>> ========
> > > > >> >>>>> Goals
> > > > >> >>>>> ========
> > > > >> >>>>>
> > > > >> >>>>> We try to achieve the following
> > > > >> >>>>>
> > > > >> >>>>>   - Pull requests get handled in a timely fashion
> > > > >> >>>>>   - New contributors are better integrated into the
> community
> > > > >> >>>>>   - The community feels empowered on the mailing list.
> > > > >> >>>>>     But questions that need the attention of someone that
> has
> > > deep
> > > > >> >>>>> knowledge of a certain part of Flink get their attention.
> > > > >> >>>>>   - At the same time, the committers that are knowledgeable
> > > about
> > > > >> >> many
> > > > >> >>>> core
> > > > >> >>>>> parts do not get completely overwhelmed.
> > > > >> >>>>>   - We don't overlook threads that report critical issues.
> > > > >> >>>>>   - We always have a pretty good overview of what the status
> > of
> > > > >> >> certain
> > > > >> >>>>> parts of the system are.
> > > > >> >>>>>       -> What are often encountered known issues
> > > > >> >>>>>       -> What are the most frequently requested features
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ========
> > > > >> >>>>> Problems
> > > > >> >>>>> ========
> > > > >> >>>>>
> > > > >> >>>>> Looking into the process, there are two big issues:
> > > > >> >>>>>
> > > > >> >>>>> (1) Up to now, we have been relying on the fact that
> > everything
> > > > just
> > > > >> >>>>> "organizes itself", driven by best effort. That assumes
> > > > >> >>>>> that everyone feels equally responsible for every part,
> > > question,
> > > > and
> > > > >> >>>>> contribution. At the current state, this is impossible
> > > > >> >>>>> to maintain, it overwhelms the committers and contributors.
> > > > >> >>>>>
> > > > >> >>>>> Example: Pull requests are picked up by whoever wants to
> pick
> > > them
> > > > >> >> up.
> > > > >> >>>> Pull
> > > > >> >>>>> requests that are a lot of work, have little
> > > > >> >>>>> chance of getting in, or relate to less active components
> are
> > > > >> >> sometimes
> > > > >> >>>> not
> > > > >> >>>>> picked up. When contributors are pretty
> > > > >> >>>>> loaded already, it may happen that no one eventually feels
> > > > >> >> responsible
> > > > >> >>> to
> > > > >> >>>>> pick up a pull request, and it falls through the cracks.
> > > > >> >>>>>
> > > > >> >>>>> (2) There is no good overview of what are known
> shortcomings,
> > > > >> >> efforts,
> > > > >> >>>> and
> > > > >> >>>>> requested features for different parts of the system.
> > > > >> >>>>> This information exists in various peoples' heads, but is
> not
> > > > easily
> > > > >> >>>>> accessible for new people. The Flink JIRA is not well
> > > > >> >>>>> maintained, it is not easy to draw insights from that.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ===========
> > > > >> >>>>> The Proposal
> > > > >> >>>>> ===========
> > > > >> >>>>>
> > > > >> >>>>> Since we are building a parallel system, the natural
> solution
> > > > seems
> > > > >> >> to
> > > > >> >>>> be:
> > > > >> >>>>> partition the workload ;-)
> > > > >> >>>>>
> > > > >> >>>>> We propose to define a set of components for Flink. Each
> > > > component is
> > > > >> >>>>> maintained or tracked by one or more
> > > > >> >>>>> people - let's call them maintainers. It is important to
> note
> > > > that we
> > > > >> >>>> don't
> > > > >> >>>>> suggest the maintainers as an authoritative role, but
> > > > >> >>>>> simply as committers or contributors that visibly step up
> for
> > a
> > > > >> >> certain
> > > > >> >>>>> component, and mainly track and drive the efforts
> > > > >> >>>>> pertaining to that component.
> > > > >> >>>>>
> > > > >> >>>>> It is also important to realize that we do not want to
> suggest
> > > > that
> > > > >> >>>> people
> > > > >> >>>>> get less involved with certain parts and components, because
> > > > >> >>>>> they are not the maintainers. We simply want to make sure
> that
> > > > each
> > > > >> >>> pull
> > > > >> >>>>> request or question or contribution has in the end
> > > > >> >>>>> one person (or a small set of people) responsible for
> catching
> > > and
> > > > >> >>>> tracking
> > > > >> >>>>> it, if it was not worked on by the pro-active
> > > > >> >>>>> community.
> > > > >> >>>>>
> > > > >> >>>>> For some components, having multiple maintainers will be
> > > helpful.
> > > > In
> > > > >> >>> that
> > > > >> >>>>> case, one maintainer should be the "chair" or "lead"
> > > > >> >>>>> and make sure that no issue of that component gets lost
> > between
> > > > the
> > > > >> >>>>> multiple maintainers.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> A maintainers' role is:
> > > > >> >>>>> -----------------------------
> > > > >> >>>>>
> > > > >> >>>>>   - Have an overview of which of the open pull requests
> relate
> > > to
> > > > >> >> their
> > > > >> >>>>> component
> > > > >> >>>>>   - Drive the pull requests relating to the component to
> > > > resolution
> > > > >> >>>>>       => Moderate the decision whether the feature should be
> > > > merged
> > > > >> >>>>>       => Make sure the pull request gets a shepherd.
> > > > >> >>>>>            In many cases, the maintainers would shepherd
> > > > themselves.
> > > > >> >>>>>       => In case the shepherd becomes inactive, the
> > maintainers
> > > > need
> > > > >> >> to
> > > > >> >>>>> find a new shepherd.
> > > > >> >>>>>
> > > > >> >>>>>   - Have an overview of what are the known issues of their
> > > > component
> > > > >> >>>>>   - Have an overview of what are the frequently requested
> > > > features of
> > > > >> >>>> their
> > > > >> >>>>> component
> > > > >> >>>>>
> > > > >> >>>>>   - Have an overview of which contributors are doing very
> good
> > > > work
> > > > >> >> in
> > > > >> >>>>> their component,
> > > > >> >>>>>     would be candidates for committers, and should be
> mentored
> > > > >> >> towards
> > > > >> >>>> that.
> > > > >> >>>>>
> > > > >> >>>>>   - Resolve email threads that have been brought to their
> > > > attention,
> > > > >> >>>>> because deeper
> > > > >> >>>>>     component knowledge is required for that thread.
> > > > >> >>>>>
> > > > >> >>>>> A maintainers' role is NOT:
> > > > >> >>>>> ----------------------------------
> > > > >> >>>>>
> > > > >> >>>>>   - Review all pull requests of that component
> > > > >> >>>>>   - Answer every mail with questions about that component
> > > > >> >>>>>   - Fix all bugs and implement all features of that
> components
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> We imagine the following way that the community and the
> > > > maintainers
> > > > >> >>>>> interact:
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> ---------------------------------------------------------------------------------------------------------
> > > > >> >>>>>
> > > > >> >>>>>   - Pull requests should be tagged by component. Since we
> > cannot
> > > > add
> > > > >> >>>> labels
> > > > >> >>>>> at this point, we need
> > > > >> >>>>>     to rely on the following:
> > > > >> >>>>>      => The pull request opener should name the pull request
> > > like
> > > > >> >>>>> "[FLINK-XXX] [component] Title"
> > > > >> >>>>>      => Components can be (re) tagged by adding special
> > comments
> > > > in
> > > > >> >> the
> > > > >> >>>>> pull request ("==> component client")
> > > > >> >>>>>      => With some luck, GitHub and Apache Infra will allow
> us
> > to
> > > > use
> > > > >> >>>> labels
> > > > >> >>>>> at some point
> > > > >> >>>>>
> > > > >> >>>>>   - When pull requests are associated with a component, the
> > > > >> >> maintainers
> > > > >> >>>>> will manage them
> > > > >> >>>>>     (decision whether to add, find shepherd, catch dropped
> > pull
> > > > >> >>> requests)
> > > > >> >>>>>
> > > > >> >>>>>   - We assume that maintainers frequently reach out to other
> > > > >> >> community
> > > > >> >>>>> members and ask them if they want
> > > > >> >>>>>     to shepherd a pull request.
> > > > >> >>>>>
> > > > >> >>>>>   - On the mailing list, everyone should feel equally
> > empowered
> > > to
> > > > >> >>> answer
> > > > >> >>>>> and discuss.
> > > > >> >>>>>     If at some point in the discussion, some deep technical
> > > > knowledge
> > > > >> >>>> about
> > > > >> >>>>> a component is required,
> > > > >> >>>>>     the maintainer(s) should be drawn into the discussion.
> > > > >> >>>>>     Because the Mailing List infrastructure has no support
> to
> > > tag
> > > > >> >>>> threads,
> > > > >> >>>>> here are some simple workarounds:
> > > > >> >>>>>
> > > > >> >>>>>     => One possibility is to put the maintainers' mail
> > addresses
> > > > on
> > > > >> >> cc
> > > > >> >>>> for
> > > > >> >>>>> the thread, so they get the mail
> > > > >> >>>>>           not just via l the mailing list
> > > > >> >>>>>     => Another way would be to post something like
> > "+maintainer
> > > > >> >>> runtime"
> > > > >> >>>> in
> > > > >> >>>>> the thread and the "runtime"
> > > > >> >>>>>          maintainers would have a filter/alert on these
> > keywords
> > > > in
> > > > >> >>> their
> > > > >> >>>>> mail program.
> > > > >> >>>>>
> > > > >> >>>>>   - We assume that maintainers will reach out to community
> > > members
> > > > >> >> that
> > > > >> >>>> are
> > > > >> >>>>> very active and helpful in
> > > > >> >>>>>     a component, and will ask them if they want to be added
> as
> > > > >> >>>> maintainers.
> > > > >> >>>>>     That will make it visible that those people are experts
> > for
> > > > that
> > > > >> >>> part
> > > > >> >>>>> of Flink.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ======================================
> > > > >> >>>>> Maintainers: Committers and Contributors
> > > > >> >>>>> ======================================
> > > > >> >>>>>
> > > > >> >>>>> It helps if maintainers are committers (since we want them
> to
> > > > resolve
> > > > >> >>>> pull
> > > > >> >>>>> requests which often involves
> > > > >> >>>>> merging them).
> > > > >> >>>>>
> > > > >> >>>>> Components with multiple maintainers can easily have
> > > non-committer
> > > > >> >>>>> contributors in addition to committer
> > > > >> >>>>> contributors.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ======
> > > > >> >>>>> JIRA
> > > > >> >>>>> ======
> > > > >> >>>>>
> > > > >> >>>>> Ideally, JIRA can be used to get an overview of what are the
> > > known
> > > > >> >>> issues
> > > > >> >>>>> of each component, and what are
> > > > >> >>>>> common feature requests. Unfortunately, the Flink JIRA is
> > quite
> > > > >> >>>> unorganized
> > > > >> >>>>> right now.
> > > > >> >>>>>
> > > > >> >>>>> A natural followup effort of this proposal would be to
> define
> > in
> > > > JIRA
> > > > >> >>> the
> > > > >> >>>>> same components as we defined here,
> > > > >> >>>>> and have the maintainers keep JIRA meaningful for that
> > > particular
> > > > >> >>>>> component. That would allow us to
> > > > >> >>>>> easily generate some tables out of JIRA (like top known
> issues
> > > per
> > > > >> >>>>> component, most requested features)
> > > > >> >>>>> post them on the dev list once in a while as a "state of the
> > > > union"
> > > > >> >>>> report.
> > > > >> >>>>>
> > > > >> >>>>> Initial assignment of issues to components should be made by
> > > those
> > > > >> >>> people
> > > > >> >>>>> opening the issue. The maintainer
> > > > >> >>>>> of that tagged component needs to change the tag, if the
> > > component
> > > > >> >> was
> > > > >> >>>>> classified incorrectly.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ======================================
> > > > >> >>>>> Initial Components and Maintainers Suggestion
> > > > >> >>>>> ======================================
> > > > >> >>>>>
> > > > >> >>>>> Below is a suggestion of how to define components for Flink.
> > One
> > > > goal
> > > > >> >>> of
> > > > >> >>>>> the division was to make it
> > > > >> >>>>> obvious for the majority of questions and contributions to
> > which
> > > > >> >>>> component
> > > > >> >>>>> they would relate. Otherwise,
> > > > >> >>>>> if many contributions had fuzzy component associations, we
> > would
> > > > >> >> again
> > > > >> >>>> not
> > > > >> >>>>> solve the issue of having clear
> > > > >> >>>>> responsibilities for who would track the progress and
> > > resolution.
> > > > >> >>>>>
> > > > >> >>>>> We also looked at each component and wrote the names of some
> > > > people
> > > > >> >> who
> > > > >> >>>> we
> > > > >> >>>>> thought were natural
> > > > >> >>>>> experts for the components, and thus natural candidates for
> > > > >> >>> maintainers.
> > > > >> >>>>>
> > > > >> >>>>> **These names are only a starting point for discussion.**
> > > > >> >>>>>
> > > > >> >>>>> Once agreed upon, the components and names of maintainers
> > should
> > > > be
> > > > >> >>> kept
> > > > >> >>>> in
> > > > >> >>>>> the wiki and updated as
> > > > >> >>>>> components change and people step up or down.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> > > > >> >>>>>   - Incuding Hadoop compat. parts
> > > > >> >>>>>
> > > > >> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> > > > >> >>>>>
> > > > >> >>>>> *Runtime*
> > > > >> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)
> > > > (*Till*)
> > > > >> >>>>>   - Local Runtime (Memory Management, State Backends,
> > > > >> >> Tasks/Operators)
> > > > >> >>> (
> > > > >> >>>>> *Stephan*)
> > > > >> >>>>>   - Network (*Ufuk*)
> > > > >> >>>>>
> > > > >> >>>>> *Client/Optimizer* (*Fabian*)
> > > > >> >>>>>
> > > > >> >>>>> *Type system / Type extractor* (Timo)
> > > > >> >>>>>
> > > > >> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max,
> > Robert*)
> > > > >> >>>>>
> > > > >> >>>>> *Libraries*
> > > > >> >>>>>   - Gelly (*Vasia, Greg*)
> > > > >> >>>>>   - ML (*Till, Theo*)
> > > > >> >>>>>   - CEP (*Till*)
> > > > >> >>>>>   - Python (*Chesnay*)
> > > > >> >>>>>
> > > > >> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> > > > >> >>>>>
> > > > >> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> > > > >> >>>>>
> > > > >> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> > > > >> >>>>>
> > > > >> >>>>> *Storm Compatibility Layer* (*Mathias*)
> > > > >> >>>>>
> > > > >> >>>>> *Scala shell* (*Till*)
> > > > >> >>>>>
> > > > >> >>>>> *Startup Shell Scripts* (Ufuk)
> > > > >> >>>>>
> > > > >> >>>>> *Flink Build System, Maven Files* (*Robert*)
> > > > >> >>>>>
> > > > >> >>>>> *Documentation* (Ufuk)
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> Please let us know what you think about this proposal.
> > > > >> >>>>> Happy discussing!
> > > > >> >>>>>
> > > > >> >>>>> Greetings,
> > > > >> >>>>> Stephan
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Reply via email to