Thanks for great suggestion. +1 for this proposal.
Regards, Chiwan Park > On May 13, 2016, at 1:44 AM, Nick Dimiduk <ndimi...@apache.org> wrote: > > For what it's worth, this is very close to how HBase attempts to manage the > community load. We break out components (in Jira), with a list of named > component maintainers. Actually, having components alone has given a Big > Bang for the buck because when properly labeled, it makes it really easy > for part-timers to channel their efforts with precision. > > As a flink user, I'm +1 for this proposal as well :) > > On Thursday, May 12, 2016, Aljoscha Krettek <aljos...@apache.org> wrote: > >> +1 >> >> The ideas seem good and the proposed number of components seems reasonable. >> With this, we should also then cleanup the JIRA to make it actually usable. >> >> On Thu, 12 May 2016 at 18:09 Stephan Ewen <se...@apache.org <javascript:;>> >> wrote: >> >>> All maintainer candidates are only proposals so far. No indication of >> lead >>> or anything so far. >>> >>> Let's first see if we agree on the structure proposed here, and if we >> take >>> the components as suggested here or if we refine the list. >>> Am 12.05.2016 17:45 schrieb "Robert Metzger" <rmetz...@apache.org >> <javascript:;>>: >>> >>>> tl;dr: +1 >>>> >>>> I also like the proposal a lot. Our community is growing at a quite >> fast >>>> pace and we need to have some structure in place to still keep track of >>>> everything going on. >>>> >>>> I'm happy to see that the proposal mentions cleaning up our JIRA. This >> is >>>> something that has been annoying me for quite a while, but its too big >> to >>>> do it alone. If maintainers could take care of their components, we >>> should >>>> have covered already a lot there. >>>> >>>> One question regarding the "chair" or "lead" role for components: Is >> the >>>> first name in the list of maintainers the lead? >>>> >>>> I would actually suggest to wait until all proposed maintainers agreed >> to >>>> the proposal. It doesn't make sense to make somebody a maintainer of >>>> something if they disagree or are not aware of it. >>>> >>>> >>>> >>>> >>>> On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <m...@apache.org >> <javascript:;>> >>>> wrote: >>>> >>>>> +1 for the initiative. With a better process we will improve the >>>>> quality of the Flink development and give us more time to focus. >>>>> >>>>> Could we have another category "Infrastructure"? This would concern >>>>> things like CI, nightly deployment of snapshots/documentation, ASF >>>>> Infra communication. Robert and me could be the initial maintainers >>>>> for that. >>>>> >>>>> On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <se...@apache.org >> <javascript:;>> >>> wrote: >>>>>> Yes, Matthias, that was supposed to be you. >>>>>> Sorry from another guy who frequently has his name misspelled ;-) >>>>>> >>>>>> On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <mj...@apache.org >> <javascript:;>> >>>>> wrote: >>>>>> >>>>>>> +1 from my side. >>>>>>> >>>>>>> Happy to be the maintainer for Storm-Compatibiltiy (at least I >> guess >>>>>>> it's me, even the correct spelling would be with two 't' :P) >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> On 05/12/2016 12:56 PM, Till Rohrmann wrote: >>>>>>>> +1 for the proposal >>>>>>>> On May 12, 2016 12:13 PM, "Stephan Ewen" <se...@apache.org >> <javascript:;>> >>> wrote: >>>>>>>> >>>>>>>>> Yes, Gabor Gevay, that did refer to you! >>>>>>>>> >>>>>>>>> Sorry for the ambiguity... >>>>>>>>> >>>>>>>>> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi < >>>>>>> balassi.mar...@gmail.com <javascript:;> >>>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 for the proposal >>>>>>>>>> @ggevay: I do think that it refers to you. :) >>>>>>>>>> >>>>>>>>>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay < >> gga...@gmail.com <javascript:;> >>>> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> There are at least three Gábors in the Flink community, :) >> so >>>>>>>>>>> assuming that the Gábor in the list of maintainers of the >>> DataSet >>>>> API >>>>>>>>>>> is referring to me, I'll be happy to do it. :) >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Gábor G. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <se...@apache.org >> <javascript:;>>: >>>>>>>>>>>> Hi everyone! >>>>>>>>>>>> >>>>>>>>>>>> We propose to establish some lightweight structures in the >>> Flink >>>>> open >>>>>>>>>>>> source community and development process, >>>>>>>>>>>> to help us better handle the increased interest in Flink >>>> (mailing >>>>>>>>> list >>>>>>>>>>> and >>>>>>>>>>>> pull requests), while not overwhelming the >>>>>>>>>>>> committers, and giving users and contributors a good >>> experience. >>>>>>>>>>>> >>>>>>>>>>>> This proposal is triggered by the observation that we are >>>> reaching >>>>>>>>> the >>>>>>>>>>>> limits of where the current community can support >>>>>>>>>>>> users and guide new contributors. The below proposal is >> based >>> on >>>>>>>>>>>> observations and ideas from Till, Robert, and me. >>>>>>>>>>>> >>>>>>>>>>>> ======== >>>>>>>>>>>> Goals >>>>>>>>>>>> ======== >>>>>>>>>>>> >>>>>>>>>>>> We try to achieve the following >>>>>>>>>>>> >>>>>>>>>>>> - Pull requests get handled in a timely fashion >>>>>>>>>>>> - New contributors are better integrated into the >> community >>>>>>>>>>>> - The community feels empowered on the mailing list. >>>>>>>>>>>> But questions that need the attention of someone that >> has >>>> deep >>>>>>>>>>>> knowledge of a certain part of Flink get their attention. >>>>>>>>>>>> - At the same time, the committers that are knowledgeable >>>> about >>>>>>>>> many >>>>>>>>>>> core >>>>>>>>>>>> parts do not get completely overwhelmed. >>>>>>>>>>>> - We don't overlook threads that report critical issues. >>>>>>>>>>>> - We always have a pretty good overview of what the status >>> of >>>>>>>>> certain >>>>>>>>>>>> parts of the system are. >>>>>>>>>>>> -> What are often encountered known issues >>>>>>>>>>>> -> What are the most frequently requested features >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ======== >>>>>>>>>>>> Problems >>>>>>>>>>>> ======== >>>>>>>>>>>> >>>>>>>>>>>> Looking into the process, there are two big issues: >>>>>>>>>>>> >>>>>>>>>>>> (1) Up to now, we have been relying on the fact that >>> everything >>>>> just >>>>>>>>>>>> "organizes itself", driven by best effort. That assumes >>>>>>>>>>>> that everyone feels equally responsible for every part, >>>> question, >>>>> and >>>>>>>>>>>> contribution. At the current state, this is impossible >>>>>>>>>>>> to maintain, it overwhelms the committers and contributors. >>>>>>>>>>>> >>>>>>>>>>>> Example: Pull requests are picked up by whoever wants to >> pick >>>> them >>>>>>>>> up. >>>>>>>>>>> Pull >>>>>>>>>>>> requests that are a lot of work, have little >>>>>>>>>>>> chance of getting in, or relate to less active components >> are >>>>>>>>> sometimes >>>>>>>>>>> not >>>>>>>>>>>> picked up. When contributors are pretty >>>>>>>>>>>> loaded already, it may happen that no one eventually feels >>>>>>>>> responsible >>>>>>>>>> to >>>>>>>>>>>> pick up a pull request, and it falls through the cracks. >>>>>>>>>>>> >>>>>>>>>>>> (2) There is no good overview of what are known >> shortcomings, >>>>>>>>> efforts, >>>>>>>>>>> and >>>>>>>>>>>> requested features for different parts of the system. >>>>>>>>>>>> This information exists in various peoples' heads, but is >> not >>>>> easily >>>>>>>>>>>> accessible for new people. The Flink JIRA is not well >>>>>>>>>>>> maintained, it is not easy to draw insights from that. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> =========== >>>>>>>>>>>> The Proposal >>>>>>>>>>>> =========== >>>>>>>>>>>> >>>>>>>>>>>> Since we are building a parallel system, the natural >> solution >>>>> seems >>>>>>>>> to >>>>>>>>>>> be: >>>>>>>>>>>> partition the workload ;-) >>>>>>>>>>>> >>>>>>>>>>>> We propose to define a set of components for Flink. Each >>>>> component is >>>>>>>>>>>> maintained or tracked by one or more >>>>>>>>>>>> people - let's call them maintainers. It is important to >> note >>>>> that we >>>>>>>>>>> don't >>>>>>>>>>>> suggest the maintainers as an authoritative role, but >>>>>>>>>>>> simply as committers or contributors that visibly step up >> for >>> a >>>>>>>>> certain >>>>>>>>>>>> component, and mainly track and drive the efforts >>>>>>>>>>>> pertaining to that component. >>>>>>>>>>>> >>>>>>>>>>>> It is also important to realize that we do not want to >> suggest >>>>> that >>>>>>>>>>> people >>>>>>>>>>>> get less involved with certain parts and components, because >>>>>>>>>>>> they are not the maintainers. We simply want to make sure >> that >>>>> each >>>>>>>>>> pull >>>>>>>>>>>> request or question or contribution has in the end >>>>>>>>>>>> one person (or a small set of people) responsible for >> catching >>>> and >>>>>>>>>>> tracking >>>>>>>>>>>> it, if it was not worked on by the pro-active >>>>>>>>>>>> community. >>>>>>>>>>>> >>>>>>>>>>>> For some components, having multiple maintainers will be >>>> helpful. >>>>> In >>>>>>>>>> that >>>>>>>>>>>> case, one maintainer should be the "chair" or "lead" >>>>>>>>>>>> and make sure that no issue of that component gets lost >>> between >>>>> the >>>>>>>>>>>> multiple maintainers. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> A maintainers' role is: >>>>>>>>>>>> ----------------------------- >>>>>>>>>>>> >>>>>>>>>>>> - Have an overview of which of the open pull requests >> relate >>>> to >>>>>>>>> their >>>>>>>>>>>> component >>>>>>>>>>>> - Drive the pull requests relating to the component to >>>>> resolution >>>>>>>>>>>> => Moderate the decision whether the feature should be >>>>> merged >>>>>>>>>>>> => Make sure the pull request gets a shepherd. >>>>>>>>>>>> In many cases, the maintainers would shepherd >>>>> themselves. >>>>>>>>>>>> => In case the shepherd becomes inactive, the >>> maintainers >>>>> need >>>>>>>>> to >>>>>>>>>>>> find a new shepherd. >>>>>>>>>>>> >>>>>>>>>>>> - Have an overview of what are the known issues of their >>>>> component >>>>>>>>>>>> - Have an overview of what are the frequently requested >>>>> features of >>>>>>>>>>> their >>>>>>>>>>>> component >>>>>>>>>>>> >>>>>>>>>>>> - Have an overview of which contributors are doing very >> good >>>>> work >>>>>>>>> in >>>>>>>>>>>> their component, >>>>>>>>>>>> would be candidates for committers, and should be >> mentored >>>>>>>>> towards >>>>>>>>>>> that. >>>>>>>>>>>> >>>>>>>>>>>> - Resolve email threads that have been brought to their >>>>> attention, >>>>>>>>>>>> because deeper >>>>>>>>>>>> component knowledge is required for that thread. >>>>>>>>>>>> >>>>>>>>>>>> A maintainers' role is NOT: >>>>>>>>>>>> ---------------------------------- >>>>>>>>>>>> >>>>>>>>>>>> - Review all pull requests of that component >>>>>>>>>>>> - Answer every mail with questions about that component >>>>>>>>>>>> - Fix all bugs and implement all features of that >> components >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> We imagine the following way that the community and the >>>>> maintainers >>>>>>>>>>>> interact: >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> >>> >> --------------------------------------------------------------------------------------------------------- >>>>>>>>>>>> >>>>>>>>>>>> - Pull requests should be tagged by component. Since we >>> cannot >>>>> add >>>>>>>>>>> labels >>>>>>>>>>>> at this point, we need >>>>>>>>>>>> to rely on the following: >>>>>>>>>>>> => The pull request opener should name the pull request >>>> like >>>>>>>>>>>> "[FLINK-XXX] [component] Title" >>>>>>>>>>>> => Components can be (re) tagged by adding special >>> comments >>>>> in >>>>>>>>> the >>>>>>>>>>>> pull request ("==> component client") >>>>>>>>>>>> => With some luck, GitHub and Apache Infra will allow >> us >>> to >>>>> use >>>>>>>>>>> labels >>>>>>>>>>>> at some point >>>>>>>>>>>> >>>>>>>>>>>> - When pull requests are associated with a component, the >>>>>>>>> maintainers >>>>>>>>>>>> will manage them >>>>>>>>>>>> (decision whether to add, find shepherd, catch dropped >>> pull >>>>>>>>>> requests) >>>>>>>>>>>> >>>>>>>>>>>> - We assume that maintainers frequently reach out to other >>>>>>>>> community >>>>>>>>>>>> members and ask them if they want >>>>>>>>>>>> to shepherd a pull request. >>>>>>>>>>>> >>>>>>>>>>>> - On the mailing list, everyone should feel equally >>> empowered >>>> to >>>>>>>>>> answer >>>>>>>>>>>> and discuss. >>>>>>>>>>>> If at some point in the discussion, some deep technical >>>>> knowledge >>>>>>>>>>> about >>>>>>>>>>>> a component is required, >>>>>>>>>>>> the maintainer(s) should be drawn into the discussion. >>>>>>>>>>>> Because the Mailing List infrastructure has no support >> to >>>> tag >>>>>>>>>>> threads, >>>>>>>>>>>> here are some simple workarounds: >>>>>>>>>>>> >>>>>>>>>>>> => One possibility is to put the maintainers' mail >>> addresses >>>>> on >>>>>>>>> cc >>>>>>>>>>> for >>>>>>>>>>>> the thread, so they get the mail >>>>>>>>>>>> not just via l the mailing list >>>>>>>>>>>> => Another way would be to post something like >>> "+maintainer >>>>>>>>>> runtime" >>>>>>>>>>> in >>>>>>>>>>>> the thread and the "runtime" >>>>>>>>>>>> maintainers would have a filter/alert on these >>> keywords >>>>> in >>>>>>>>>> their >>>>>>>>>>>> mail program. >>>>>>>>>>>> >>>>>>>>>>>> - We assume that maintainers will reach out to community >>>> members >>>>>>>>> that >>>>>>>>>>> are >>>>>>>>>>>> very active and helpful in >>>>>>>>>>>> a component, and will ask them if they want to be added >> as >>>>>>>>>>> maintainers. >>>>>>>>>>>> That will make it visible that those people are experts >>> for >>>>> that >>>>>>>>>> part >>>>>>>>>>>> of Flink. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ====================================== >>>>>>>>>>>> Maintainers: Committers and Contributors >>>>>>>>>>>> ====================================== >>>>>>>>>>>> >>>>>>>>>>>> It helps if maintainers are committers (since we want them >> to >>>>> resolve >>>>>>>>>>> pull >>>>>>>>>>>> requests which often involves >>>>>>>>>>>> merging them). >>>>>>>>>>>> >>>>>>>>>>>> Components with multiple maintainers can easily have >>>> non-committer >>>>>>>>>>>> contributors in addition to committer >>>>>>>>>>>> contributors. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ====== >>>>>>>>>>>> JIRA >>>>>>>>>>>> ====== >>>>>>>>>>>> >>>>>>>>>>>> Ideally, JIRA can be used to get an overview of what are the >>>> known >>>>>>>>>> issues >>>>>>>>>>>> of each component, and what are >>>>>>>>>>>> common feature requests. Unfortunately, the Flink JIRA is >>> quite >>>>>>>>>>> unorganized >>>>>>>>>>>> right now. >>>>>>>>>>>> >>>>>>>>>>>> A natural followup effort of this proposal would be to >> define >>> in >>>>> JIRA >>>>>>>>>> the >>>>>>>>>>>> same components as we defined here, >>>>>>>>>>>> and have the maintainers keep JIRA meaningful for that >>>> particular >>>>>>>>>>>> component. That would allow us to >>>>>>>>>>>> easily generate some tables out of JIRA (like top known >> issues >>>> per >>>>>>>>>>>> component, most requested features) >>>>>>>>>>>> post them on the dev list once in a while as a "state of the >>>>> union" >>>>>>>>>>> report. >>>>>>>>>>>> >>>>>>>>>>>> Initial assignment of issues to components should be made by >>>> those >>>>>>>>>> people >>>>>>>>>>>> opening the issue. The maintainer >>>>>>>>>>>> of that tagged component needs to change the tag, if the >>>> component >>>>>>>>> was >>>>>>>>>>>> classified incorrectly. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ====================================== >>>>>>>>>>>> Initial Components and Maintainers Suggestion >>>>>>>>>>>> ====================================== >>>>>>>>>>>> >>>>>>>>>>>> Below is a suggestion of how to define components for Flink. >>> One >>>>> goal >>>>>>>>>> of >>>>>>>>>>>> the division was to make it >>>>>>>>>>>> obvious for the majority of questions and contributions to >>> which >>>>>>>>>>> component >>>>>>>>>>>> they would relate. Otherwise, >>>>>>>>>>>> if many contributions had fuzzy component associations, we >>> would >>>>>>>>> again >>>>>>>>>>> not >>>>>>>>>>>> solve the issue of having clear >>>>>>>>>>>> responsibilities for who would track the progress and >>>> resolution. >>>>>>>>>>>> >>>>>>>>>>>> We also looked at each component and wrote the names of some >>>>> people >>>>>>>>> who >>>>>>>>>>> we >>>>>>>>>>>> thought were natural >>>>>>>>>>>> experts for the components, and thus natural candidates for >>>>>>>>>> maintainers. >>>>>>>>>>>> >>>>>>>>>>>> **These names are only a starting point for discussion.** >>>>>>>>>>>> >>>>>>>>>>>> Once agreed upon, the components and names of maintainers >>> should >>>>> be >>>>>>>>>> kept >>>>>>>>>>> in >>>>>>>>>>>> the wiki and updated as >>>>>>>>>>>> components change and people step up or down. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *DataSet API* (*Fabian, Greg, Gabor*) >>>>>>>>>>>> - Incuding Hadoop compat. parts >>>>>>>>>>>> >>>>>>>>>>>> *DataStream API* (*Aljoscha, Max, Stephan*) >>>>>>>>>>>> >>>>>>>>>>>> *Runtime* >>>>>>>>>>>> - Distributed Coordination (JobManager/TaskManager, Akka) >>>>> (*Till*) >>>>>>>>>>>> - Local Runtime (Memory Management, State Backends, >>>>>>>>> Tasks/Operators) >>>>>>>>>> ( >>>>>>>>>>>> *Stephan*) >>>>>>>>>>>> - Network (*Ufuk*) >>>>>>>>>>>> >>>>>>>>>>>> *Client/Optimizer* (*Fabian*) >>>>>>>>>>>> >>>>>>>>>>>> *Type system / Type extractor* (Timo) >>>>>>>>>>>> >>>>>>>>>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, >>> Robert*) >>>>>>>>>>>> >>>>>>>>>>>> *Libraries* >>>>>>>>>>>> - Gelly (*Vasia, Greg*) >>>>>>>>>>>> - ML (*Till, Theo*) >>>>>>>>>>>> - CEP (*Till*) >>>>>>>>>>>> - Python (*Chesnay*) >>>>>>>>>>>> >>>>>>>>>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*) >>>>>>>>>>>> >>>>>>>>>>>> *Streaming Connectors* (*Robert*, *Aljoscha*) >>>>>>>>>>>> >>>>>>>>>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*) >>>>>>>>>>>> >>>>>>>>>>>> *Storm Compatibility Layer* (*Mathias*) >>>>>>>>>>>> >>>>>>>>>>>> *Scala shell* (*Till*) >>>>>>>>>>>> >>>>>>>>>>>> *Startup Shell Scripts* (Ufuk) >>>>>>>>>>>> >>>>>>>>>>>> *Flink Build System, Maven Files* (*Robert*) >>>>>>>>>>>> >>>>>>>>>>>> *Documentation* (Ufuk) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Please let us know what you think about this proposal. >>>>>>>>>>>> Happy discussing! >>>>>>>>>>>> >>>>>>>>>>>> Greetings, >>>>>>>>>>>> Stephan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>> >>