I am glad that it was not only what I was thinking. I also do agree with Holden, Sean and Cody. All I wanted to say were all said.
2016-10-08 1:16 GMT+09:00 Holden Karau <hol...@pigscanfly.ca>: > First off, thanks Cody for taking the time to put together these proposals > - I think it has kicked off some wonderful discussion. > > I think dismissing people's complaints with Spark as largely trolls does > us a disservice, it’s important for us to recognize our own shortcomings - > otherwise we are blind to the weak spots where we need to improve and > instead focus on new features. Parts of the Python community seem to be > actively looking for alternatives, and I’d obviously like Spark continue to > be the place where we come together and collaborate from different > languages. > > I’d be more than happy to do a review of the outstanding Python PRs (I’ve > been keeping on top of the new ones but largely haven’t looked at the older > ones) and if there is a committer (maybe Davies or Sean?) who would be able > to help out with merging them once they are ready that would be awesome. > I’m at PyData DC this weekend but I’ll also start going through some of the > older Python JIRAs and seeing if they are still relevant, already fixed, or > something we are unlikely to be interested in bringing into Spark. > > I’m giving a talk later on this month on how to get started contributing > to Apache Spark at OSCON London, and when I’ve given this talk before I’ve > had to include a fair number of warnings about the challenges that can face > a new contributor. I’d love to be able to drop those in future versions :) > > P.S. > > As one of the non-committers who has been working on Spark for several > years (see http://bit.ly/hkspmg ) I have strong feelings around the > current process being used for committers - but since I’m not on the PMC > (catch-22 style) it's difficult to have any visibility into the process, so > someone who does will have to weigh in on that :) > > > On Fri, Oct 7, 2016 at 8:00 AM, Cody Koeninger <c...@koeninger.org> wrote: > >> Sean, that was very eloquently put, and I 100% agree. If I ever meet >> you in person, I'll buy you multiple rounds of beverages of your >> choice ;) >> This is probably reiterating some of what you said in a less clear >> manner, but I'll throw more of my 2 cents in. >> >> - Design. >> Yes, design by committee doesn't work. The best designs are when a >> person who understands the problem builds something that works for >> them, shares with others, and most importantly iterates when it >> doesn't work for others. This iteration only works if you're willing >> to change interfaces, but committer and user goals are not aligned >> here. Users want something that is clearly documented and helps them >> get their job done. Committers (not all) want to minimize interface >> change, even at the expense of users being able to do their jobs. In >> this situation, it is critical that you understand early what users >> need to be able to do. This is what the improvement proposal process >> should focus on: Goals, non-goals, possible solutions, rejected >> solutions. Not class-level design. Most importantly, it needs a >> clear, unambiguous outcome that is visible to the public. >> >> - Trolling >> It's not just trolling. Event time and kafka are technically >> important and should not be ignored. I've been banging this drum for >> years. These concerns haven't been fully heard and understood by >> committers. This one example of why diversity of enfranchised users >> is important and governance concerns shouldn't be ignored. >> >> - Jira >> Concretely, automate closing stale jiras after X amount of time. It's >> really surprising to me how much reluctance a community of programmers >> have shown towards automating their own processes around stuff like >> this (not to mention automatic code formatting of modified files). I >> understand the arguments against. but the current alternative doesn't >> work. >> Concretely, clearly reject and close jiras. I have a backlog of 50+ >> kafka jiras, many of which are irrelevant at this point, but I do not >> feel that I have the political power to close them. >> Concretely, make it clear who is working on something. This can be as >> simple as just "I'm working on this", assign it to me, if I don't >> follow up in X amount of time, close it or reassign. That doesn't >> mean there can't be competing work, but it does mean those people >> should talk to each other. Conversely, if committers currently don't >> have time to work on something that is important, make that clear in >> the ticket. >> >> >> On Fri, Oct 7, 2016 at 5:34 AM, Sean Owen <so...@cloudera.com> wrote: >> > Suggestion actions way at the bottom. >> > >> > On Fri, Oct 7, 2016 at 5:14 AM Matei Zaharia <matei.zaha...@gmail.com> >> > wrote: >> >> >> >> since March. But it's true that other things such as the Kafka source >> for >> >> it didn't have as much design on JIRA. Nonetheless, this component is >> still >> >> early on and there's still a lot of time to change it, which is >> happening. >> > >> > >> > It's hard to drive design discussions in OSS. Even when diligently >> > publishing design docs, the doc happens after brainstorming, and that >> > happens inside someone's head or in chats. >> > >> > The lazy consensus model that works for small changes doesn't work well >> > here. If a committer wants a change, that change will basically be made >> > modulo small edits; vetoes are for dire disagreement. (Otherwise we'd >> get >> > nothing done.) However this model means it's hard to significantly >> change a >> > design after draft 1. >> > >> > I've heard this complaint a few times, and it has never been down to bad >> > faith. We should err further towards over-including early and often. >> I've >> > seen some great discussions start more with a problem statement and an >> RFC, >> > not a design doc. Keeping regular contributors enfranchised is >> essential, so >> > that they're willing and able to participate when design time comes. >> (See >> > below.) >> > >> > >> >> >> >> 2) About what people say at Reactive Summit -- there will always be >> >> trolls, but just ignore them and build a great project. Those of us >> involved >> >> in the project for a while have long seen similar stuff, e.g. a >> > >> > >> > The hype cycle may be turning against Spark, as is normal for this >> stage of >> > maturity. People idealize technologies they don't really use as greener >> > grass; it's the things they use and need to work that they love to hate. >> > >> > I would not dismiss this as just trolling. Customer anecdotes I see >> suggest >> > that Spark underperforms their (inflated) expectations, and generally >> does >> > not Just Work. It takes expertise, tuning, patience, workarounds. And >> then >> > it gets great things done. I do see a gap between how the group here >> talks >> > about the technology, and how the users I see talk about it. The gap >> > manifests in attention given to making yet more things, and attention >> given >> > to fixing and project mechanics. >> > >> > I would also not dismiss criticism of governance. We can recognize some >> big >> > problems that were resolved over even the past 3 months. Usually I hear, >> > well, we do better than most projects, right? and that is true. But, >> Spark >> > is bigger and busier than most any other project. Exceptional projects >> need >> > exceptional governance and we have merely "good". See next. >> > >> > >> >> 3) About number and diversity of committers -- the PMC is always >> working >> >> to expand these, and you should email people on the PMC (or even the >> whole >> >> list) if you have people you'd like to propose. In >> > >> > >> > If you're suggesting that it's mostly a matter of asking, then this >> doesn't >> > match my experience. I have seen a few people consistently soft-reject >> most >> > proposals. The reasons given usually sound like "concerns about >> quality", >> > which is probably the right answer to a somewhat wrong question. >> > >> > We should probably be asking primarily who will net-net add efficiency >> to >> > some part of the project's mechanics. Per above, it wouldn't hurt to >> ask who >> > would expand coverage and add diversity of perspective too. >> > >> > I disagree that committers are being added at a sufficient rate. The >> overall >> > committer-attention hours is dropping as the project grows -- am I the >> only >> > one that perceives many regular committers aren't working nearly as >> much as >> > before on the project? >> > >> > I call it a problem because we have IMHO people who 'qualify', and not >> > giving them some stake is going to cost the project down the road. >> Always Be >> > Recruiting. This is what I would worry about, since the governance and >> > enfranchisement issues above kind of stem from this. >> > >> > >> >> >> >> 4) Finally, about better organizing JIRA, marking dead issues, etc, >> this >> >> would be great and I think we just need a concrete proposal for how to >> do >> >> it. It would be best to point to an existing process that someone else >> has >> >> used here BTW so that we can see it in action. >> > >> > >> > I don't think we're wanting for proposals. I went on and on about it >> last >> > year, and don't think anyone disagreed about actions. I wouldn't suggest >> > that clearing out dead issues is more complex than just putting in time >> to >> > do it. It's just grunt work and understandably not appealing. (Thank you >> > Xiao for your recent run at SQL JIRAs.) >> > >> > It requires saying 'no', which is hard, because it requires some >> conviction. >> > I have encountered reluctance to do this in Spark and think that culture >> > should change. Is it weird to say that a broader group of gatekeepers >> can >> > actually with more confidence and efficiency tackle the triage issue? >> that >> > pushing back on 'bad' contribution actually increases the rate of >> 'good'? >> > >> > FWIW I also find the project unpleasant to deal with day to day, mostly >> > because of the scale of the triage, and think we could use all the >> qualified >> > help we can get. I am looking to do less with the project over time, >> which >> > is no big deal in itself, but is a big deal if these several factors are >> > adding up to discourage fresh blood from joining the fray. Cody makes me >> > think there are, at least, 2 of us. >> > >> > Concrete steps? >> > >> > Go to spark-prs.com. Look at "Users". Look at your open PRs. Are any >> stale? >> > can you close them or advance them? >> > >> > Look at the Stale PRs tab and sort by last updated. Do any look dead? >> can >> > you ask the author to update or close? does the parent JIRA look like >> it's >> > not otherwise relevant? >> > >> > Go download JIRA Client at http://almworks.com/jiraclient/download.html >> Go >> > look at all open JIRAs sorted by last update. Are any pretty obviously >> > obsolete? >> > >> > If you don't feel comfortable acting, feel free to at least propose a >> list >> > to dev@ for a look. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau >