For what it's worth, this is very close to how HBase attempts to manage the community load. We break out components (in Jira), with a list of named component maintainers. Actually, having components alone has given a Big Bang for the buck because when properly labeled, it makes it really easy for part-timers to channel their efforts with precision.
As a flink user, I'm +1 for this proposal as well :) On Thursday, May 12, 2016, Aljoscha Krettek <aljos...@apache.org> wrote: > +1 > > The ideas seem good and the proposed number of components seems reasonable. > With this, we should also then cleanup the JIRA to make it actually usable. > > On Thu, 12 May 2016 at 18:09 Stephan Ewen <se...@apache.org <javascript:;>> > wrote: > > > All maintainer candidates are only proposals so far. No indication of > lead > > or anything so far. > > > > Let's first see if we agree on the structure proposed here, and if we > take > > the components as suggested here or if we refine the list. > > Am 12.05.2016 17:45 schrieb "Robert Metzger" <rmetz...@apache.org > <javascript:;>>: > > > > > tl;dr: +1 > > > > > > I also like the proposal a lot. Our community is growing at a quite > fast > > > pace and we need to have some structure in place to still keep track of > > > everything going on. > > > > > > I'm happy to see that the proposal mentions cleaning up our JIRA. This > is > > > something that has been annoying me for quite a while, but its too big > to > > > do it alone. If maintainers could take care of their components, we > > should > > > have covered already a lot there. > > > > > > One question regarding the "chair" or "lead" role for components: Is > the > > > first name in the list of maintainers the lead? > > > > > > I would actually suggest to wait until all proposed maintainers agreed > to > > > the proposal. It doesn't make sense to make somebody a maintainer of > > > something if they disagree or are not aware of it. > > > > > > > > > > > > > > > On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <m...@apache.org > <javascript:;>> > > > wrote: > > > > > > > +1 for the initiative. With a better process we will improve the > > > > quality of the Flink development and give us more time to focus. > > > > > > > > Could we have another category "Infrastructure"? This would concern > > > > things like CI, nightly deployment of snapshots/documentation, ASF > > > > Infra communication. Robert and me could be the initial maintainers > > > > for that. > > > > > > > > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <se...@apache.org > <javascript:;>> > > wrote: > > > > > Yes, Matthias, that was supposed to be you. > > > > > Sorry from another guy who frequently has his name misspelled ;-) > > > > > > > > > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <mj...@apache.org > <javascript:;>> > > > > wrote: > > > > > > > > > >> +1 from my side. > > > > >> > > > > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I > guess > > > > >> it's me, even the correct spelling would be with two 't' :P) > > > > >> > > > > >> -Matthias > > > > >> > > > > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote: > > > > >> > +1 for the proposal > > > > >> > On May 12, 2016 12:13 PM, "Stephan Ewen" <se...@apache.org > <javascript:;>> > > wrote: > > > > >> > > > > > >> >> Yes, Gabor Gevay, that did refer to you! > > > > >> >> > > > > >> >> Sorry for the ambiguity... > > > > >> >> > > > > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi < > > > > >> balassi.mar...@gmail.com <javascript:;> > > > > >> >>> > > > > >> >> wrote: > > > > >> >> > > > > >> >>> +1 for the proposal > > > > >> >>> @ggevay: I do think that it refers to you. :) > > > > >> >>> > > > > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay < > gga...@gmail.com <javascript:;> > > > > > > > >> wrote: > > > > >> >>> > > > > >> >>>> Hello, > > > > >> >>>> > > > > >> >>>> There are at least three Gábors in the Flink community, :) > so > > > > >> >>>> assuming that the Gábor in the list of maintainers of the > > DataSet > > > > API > > > > >> >>>> is referring to me, I'll be happy to do it. :) > > > > >> >>>> > > > > >> >>>> Best, > > > > >> >>>> Gábor G. > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <se...@apache.org > <javascript:;>>: > > > > >> >>>>> Hi everyone! > > > > >> >>>>> > > > > >> >>>>> We propose to establish some lightweight structures in the > > Flink > > > > open > > > > >> >>>>> source community and development process, > > > > >> >>>>> to help us better handle the increased interest in Flink > > > (mailing > > > > >> >> list > > > > >> >>>> and > > > > >> >>>>> pull requests), while not overwhelming the > > > > >> >>>>> committers, and giving users and contributors a good > > experience. > > > > >> >>>>> > > > > >> >>>>> This proposal is triggered by the observation that we are > > > reaching > > > > >> >> the > > > > >> >>>>> limits of where the current community can support > > > > >> >>>>> users and guide new contributors. The below proposal is > based > > on > > > > >> >>>>> observations and ideas from Till, Robert, and me. > > > > >> >>>>> > > > > >> >>>>> ======== > > > > >> >>>>> Goals > > > > >> >>>>> ======== > > > > >> >>>>> > > > > >> >>>>> We try to achieve the following > > > > >> >>>>> > > > > >> >>>>> - Pull requests get handled in a timely fashion > > > > >> >>>>> - New contributors are better integrated into the > community > > > > >> >>>>> - The community feels empowered on the mailing list. > > > > >> >>>>> But questions that need the attention of someone that > has > > > deep > > > > >> >>>>> knowledge of a certain part of Flink get their attention. > > > > >> >>>>> - At the same time, the committers that are knowledgeable > > > about > > > > >> >> many > > > > >> >>>> core > > > > >> >>>>> parts do not get completely overwhelmed. > > > > >> >>>>> - We don't overlook threads that report critical issues. > > > > >> >>>>> - We always have a pretty good overview of what the status > > of > > > > >> >> certain > > > > >> >>>>> parts of the system are. > > > > >> >>>>> -> What are often encountered known issues > > > > >> >>>>> -> What are the most frequently requested features > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> ======== > > > > >> >>>>> Problems > > > > >> >>>>> ======== > > > > >> >>>>> > > > > >> >>>>> Looking into the process, there are two big issues: > > > > >> >>>>> > > > > >> >>>>> (1) Up to now, we have been relying on the fact that > > everything > > > > just > > > > >> >>>>> "organizes itself", driven by best effort. That assumes > > > > >> >>>>> that everyone feels equally responsible for every part, > > > question, > > > > and > > > > >> >>>>> contribution. At the current state, this is impossible > > > > >> >>>>> to maintain, it overwhelms the committers and contributors. > > > > >> >>>>> > > > > >> >>>>> Example: Pull requests are picked up by whoever wants to > pick > > > them > > > > >> >> up. > > > > >> >>>> Pull > > > > >> >>>>> requests that are a lot of work, have little > > > > >> >>>>> chance of getting in, or relate to less active components > are > > > > >> >> sometimes > > > > >> >>>> not > > > > >> >>>>> picked up. When contributors are pretty > > > > >> >>>>> loaded already, it may happen that no one eventually feels > > > > >> >> responsible > > > > >> >>> to > > > > >> >>>>> pick up a pull request, and it falls through the cracks. > > > > >> >>>>> > > > > >> >>>>> (2) There is no good overview of what are known > shortcomings, > > > > >> >> efforts, > > > > >> >>>> and > > > > >> >>>>> requested features for different parts of the system. > > > > >> >>>>> This information exists in various peoples' heads, but is > not > > > > easily > > > > >> >>>>> accessible for new people. The Flink JIRA is not well > > > > >> >>>>> maintained, it is not easy to draw insights from that. > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> =========== > > > > >> >>>>> The Proposal > > > > >> >>>>> =========== > > > > >> >>>>> > > > > >> >>>>> Since we are building a parallel system, the natural > solution > > > > seems > > > > >> >> to > > > > >> >>>> be: > > > > >> >>>>> partition the workload ;-) > > > > >> >>>>> > > > > >> >>>>> We propose to define a set of components for Flink. Each > > > > component is > > > > >> >>>>> maintained or tracked by one or more > > > > >> >>>>> people - let's call them maintainers. It is important to > note > > > > that we > > > > >> >>>> don't > > > > >> >>>>> suggest the maintainers as an authoritative role, but > > > > >> >>>>> simply as committers or contributors that visibly step up > for > > a > > > > >> >> certain > > > > >> >>>>> component, and mainly track and drive the efforts > > > > >> >>>>> pertaining to that component. > > > > >> >>>>> > > > > >> >>>>> It is also important to realize that we do not want to > suggest > > > > that > > > > >> >>>> people > > > > >> >>>>> get less involved with certain parts and components, because > > > > >> >>>>> they are not the maintainers. We simply want to make sure > that > > > > each > > > > >> >>> pull > > > > >> >>>>> request or question or contribution has in the end > > > > >> >>>>> one person (or a small set of people) responsible for > catching > > > and > > > > >> >>>> tracking > > > > >> >>>>> it, if it was not worked on by the pro-active > > > > >> >>>>> community. > > > > >> >>>>> > > > > >> >>>>> For some components, having multiple maintainers will be > > > helpful. > > > > In > > > > >> >>> that > > > > >> >>>>> case, one maintainer should be the "chair" or "lead" > > > > >> >>>>> and make sure that no issue of that component gets lost > > between > > > > the > > > > >> >>>>> multiple maintainers. > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> A maintainers' role is: > > > > >> >>>>> ----------------------------- > > > > >> >>>>> > > > > >> >>>>> - Have an overview of which of the open pull requests > relate > > > to > > > > >> >> their > > > > >> >>>>> component > > > > >> >>>>> - Drive the pull requests relating to the component to > > > > resolution > > > > >> >>>>> => Moderate the decision whether the feature should be > > > > merged > > > > >> >>>>> => Make sure the pull request gets a shepherd. > > > > >> >>>>> In many cases, the maintainers would shepherd > > > > themselves. > > > > >> >>>>> => In case the shepherd becomes inactive, the > > maintainers > > > > need > > > > >> >> to > > > > >> >>>>> find a new shepherd. > > > > >> >>>>> > > > > >> >>>>> - Have an overview of what are the known issues of their > > > > component > > > > >> >>>>> - Have an overview of what are the frequently requested > > > > features of > > > > >> >>>> their > > > > >> >>>>> component > > > > >> >>>>> > > > > >> >>>>> - Have an overview of which contributors are doing very > good > > > > work > > > > >> >> in > > > > >> >>>>> their component, > > > > >> >>>>> would be candidates for committers, and should be > mentored > > > > >> >> towards > > > > >> >>>> that. > > > > >> >>>>> > > > > >> >>>>> - Resolve email threads that have been brought to their > > > > attention, > > > > >> >>>>> because deeper > > > > >> >>>>> component knowledge is required for that thread. > > > > >> >>>>> > > > > >> >>>>> A maintainers' role is NOT: > > > > >> >>>>> ---------------------------------- > > > > >> >>>>> > > > > >> >>>>> - Review all pull requests of that component > > > > >> >>>>> - Answer every mail with questions about that component > > > > >> >>>>> - Fix all bugs and implement all features of that > components > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> We imagine the following way that the community and the > > > > maintainers > > > > >> >>>>> interact: > > > > >> >>>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > > > > > > > --------------------------------------------------------------------------------------------------------- > > > > >> >>>>> > > > > >> >>>>> - Pull requests should be tagged by component. Since we > > cannot > > > > add > > > > >> >>>> labels > > > > >> >>>>> at this point, we need > > > > >> >>>>> to rely on the following: > > > > >> >>>>> => The pull request opener should name the pull request > > > like > > > > >> >>>>> "[FLINK-XXX] [component] Title" > > > > >> >>>>> => Components can be (re) tagged by adding special > > comments > > > > in > > > > >> >> the > > > > >> >>>>> pull request ("==> component client") > > > > >> >>>>> => With some luck, GitHub and Apache Infra will allow > us > > to > > > > use > > > > >> >>>> labels > > > > >> >>>>> at some point > > > > >> >>>>> > > > > >> >>>>> - When pull requests are associated with a component, the > > > > >> >> maintainers > > > > >> >>>>> will manage them > > > > >> >>>>> (decision whether to add, find shepherd, catch dropped > > pull > > > > >> >>> requests) > > > > >> >>>>> > > > > >> >>>>> - We assume that maintainers frequently reach out to other > > > > >> >> community > > > > >> >>>>> members and ask them if they want > > > > >> >>>>> to shepherd a pull request. > > > > >> >>>>> > > > > >> >>>>> - On the mailing list, everyone should feel equally > > empowered > > > to > > > > >> >>> answer > > > > >> >>>>> and discuss. > > > > >> >>>>> If at some point in the discussion, some deep technical > > > > knowledge > > > > >> >>>> about > > > > >> >>>>> a component is required, > > > > >> >>>>> the maintainer(s) should be drawn into the discussion. > > > > >> >>>>> Because the Mailing List infrastructure has no support > to > > > tag > > > > >> >>>> threads, > > > > >> >>>>> here are some simple workarounds: > > > > >> >>>>> > > > > >> >>>>> => One possibility is to put the maintainers' mail > > addresses > > > > on > > > > >> >> cc > > > > >> >>>> for > > > > >> >>>>> the thread, so they get the mail > > > > >> >>>>> not just via l the mailing list > > > > >> >>>>> => Another way would be to post something like > > "+maintainer > > > > >> >>> runtime" > > > > >> >>>> in > > > > >> >>>>> the thread and the "runtime" > > > > >> >>>>> maintainers would have a filter/alert on these > > keywords > > > > in > > > > >> >>> their > > > > >> >>>>> mail program. > > > > >> >>>>> > > > > >> >>>>> - We assume that maintainers will reach out to community > > > members > > > > >> >> that > > > > >> >>>> are > > > > >> >>>>> very active and helpful in > > > > >> >>>>> a component, and will ask them if they want to be added > as > > > > >> >>>> maintainers. > > > > >> >>>>> That will make it visible that those people are experts > > for > > > > that > > > > >> >>> part > > > > >> >>>>> of Flink. > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> ====================================== > > > > >> >>>>> Maintainers: Committers and Contributors > > > > >> >>>>> ====================================== > > > > >> >>>>> > > > > >> >>>>> It helps if maintainers are committers (since we want them > to > > > > resolve > > > > >> >>>> pull > > > > >> >>>>> requests which often involves > > > > >> >>>>> merging them). > > > > >> >>>>> > > > > >> >>>>> Components with multiple maintainers can easily have > > > non-committer > > > > >> >>>>> contributors in addition to committer > > > > >> >>>>> contributors. > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> ====== > > > > >> >>>>> JIRA > > > > >> >>>>> ====== > > > > >> >>>>> > > > > >> >>>>> Ideally, JIRA can be used to get an overview of what are the > > > known > > > > >> >>> issues > > > > >> >>>>> of each component, and what are > > > > >> >>>>> common feature requests. Unfortunately, the Flink JIRA is > > quite > > > > >> >>>> unorganized > > > > >> >>>>> right now. > > > > >> >>>>> > > > > >> >>>>> A natural followup effort of this proposal would be to > define > > in > > > > JIRA > > > > >> >>> the > > > > >> >>>>> same components as we defined here, > > > > >> >>>>> and have the maintainers keep JIRA meaningful for that > > > particular > > > > >> >>>>> component. That would allow us to > > > > >> >>>>> easily generate some tables out of JIRA (like top known > issues > > > per > > > > >> >>>>> component, most requested features) > > > > >> >>>>> post them on the dev list once in a while as a "state of the > > > > union" > > > > >> >>>> report. > > > > >> >>>>> > > > > >> >>>>> Initial assignment of issues to components should be made by > > > those > > > > >> >>> people > > > > >> >>>>> opening the issue. The maintainer > > > > >> >>>>> of that tagged component needs to change the tag, if the > > > component > > > > >> >> was > > > > >> >>>>> classified incorrectly. > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> ====================================== > > > > >> >>>>> Initial Components and Maintainers Suggestion > > > > >> >>>>> ====================================== > > > > >> >>>>> > > > > >> >>>>> Below is a suggestion of how to define components for Flink. > > One > > > > goal > > > > >> >>> of > > > > >> >>>>> the division was to make it > > > > >> >>>>> obvious for the majority of questions and contributions to > > which > > > > >> >>>> component > > > > >> >>>>> they would relate. Otherwise, > > > > >> >>>>> if many contributions had fuzzy component associations, we > > would > > > > >> >> again > > > > >> >>>> not > > > > >> >>>>> solve the issue of having clear > > > > >> >>>>> responsibilities for who would track the progress and > > > resolution. > > > > >> >>>>> > > > > >> >>>>> We also looked at each component and wrote the names of some > > > > people > > > > >> >> who > > > > >> >>>> we > > > > >> >>>>> thought were natural > > > > >> >>>>> experts for the components, and thus natural candidates for > > > > >> >>> maintainers. > > > > >> >>>>> > > > > >> >>>>> **These names are only a starting point for discussion.** > > > > >> >>>>> > > > > >> >>>>> Once agreed upon, the components and names of maintainers > > should > > > > be > > > > >> >>> kept > > > > >> >>>> in > > > > >> >>>>> the wiki and updated as > > > > >> >>>>> components change and people step up or down. > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> *DataSet API* (*Fabian, Greg, Gabor*) > > > > >> >>>>> - Incuding Hadoop compat. parts > > > > >> >>>>> > > > > >> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*) > > > > >> >>>>> > > > > >> >>>>> *Runtime* > > > > >> >>>>> - Distributed Coordination (JobManager/TaskManager, Akka) > > > > (*Till*) > > > > >> >>>>> - Local Runtime (Memory Management, State Backends, > > > > >> >> Tasks/Operators) > > > > >> >>> ( > > > > >> >>>>> *Stephan*) > > > > >> >>>>> - Network (*Ufuk*) > > > > >> >>>>> > > > > >> >>>>> *Client/Optimizer* (*Fabian*) > > > > >> >>>>> > > > > >> >>>>> *Type system / Type extractor* (Timo) > > > > >> >>>>> > > > > >> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, > > Robert*) > > > > >> >>>>> > > > > >> >>>>> *Libraries* > > > > >> >>>>> - Gelly (*Vasia, Greg*) > > > > >> >>>>> - ML (*Till, Theo*) > > > > >> >>>>> - CEP (*Till*) > > > > >> >>>>> - Python (*Chesnay*) > > > > >> >>>>> > > > > >> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*) > > > > >> >>>>> > > > > >> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*) > > > > >> >>>>> > > > > >> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*) > > > > >> >>>>> > > > > >> >>>>> *Storm Compatibility Layer* (*Mathias*) > > > > >> >>>>> > > > > >> >>>>> *Scala shell* (*Till*) > > > > >> >>>>> > > > > >> >>>>> *Startup Shell Scripts* (Ufuk) > > > > >> >>>>> > > > > >> >>>>> *Flink Build System, Maven Files* (*Robert*) > > > > >> >>>>> > > > > >> >>>>> *Documentation* (Ufuk) > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> Please let us know what you think about this proposal. > > > > >> >>>>> Happy discussing! > > > > >> >>>>> > > > > >> >>>>> Greetings, > > > > >> >>>>> Stephan > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > > >> > > > > >> > > > > > > > > > >