If I'm correctly understanding the kind of voting that you are talking about, then to be accurate, it is only the PMC members that have a vote, not all committers: https://www.apache.org/foundation/how-it-works.html#pmc-members
On Mon, Oct 10, 2016 at 12:02 PM, Cody Koeninger <c...@koeninger.org> wrote: > I think the main value is in being honest about what's going on. No > one other than committers can cast a meaningful vote, that's the > reality. Beyond that, if people think it's more open to allow formal > proposals from anyone, I'm not necessarily against it, but my main > question would be this: > > If anyone can submit a proposal, are committers actually going to > clearly reject and close proposals that don't meet the requirements? > > Right now we have a serious problem with lack of clarity regarding > contributions, and that cannot spill over into goal-setting. > > On Mon, Oct 10, 2016 at 1:54 PM, Ryan Blue <rb...@netflix.com> wrote: > > +1 to votes to approve proposals. I agree that proposals should have an > > official mechanism to be accepted, and a vote is an established means of > > doing that well. I like that it includes a period to review the proposal > and > > I think proposals should have been discussed enough ahead of a vote to > > survive the possibility of a veto. > > > > I also like the names that are short and (mostly) unique, like SEP. > > > > Where I disagree is with the requirement that a committer must formally > > propose an enhancement. I don't see the value of restricting this: if > > someone has the will to write up a proposal then they should be > encouraged > > to do so and start a discussion about it. Even if there is a political > > reality as Cody says, what is the value of codifying that in our > process? I > > think restricting who can submit proposals would only undermine them by > > pushing contributors out. Maybe I'm missing something here? > > > > rb > > > > > > > > On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org> > wrote: > >> > >> Yes, users suggesting SIPs is a good thing and is explicitly called > >> out in the linked document under the Who? section. Formally proposing > >> them, not so much, because of the political realities. > >> > >> Yes, implementation strategy definitely affects goals. There are all > >> kinds of examples of this, I'll pick one that's my fault so as to > >> avoid sounding like I'm blaming: > >> > >> When I implemented the Kafka DStream, one of my (not explicitly agreed > >> upon by the community) goals was to make sure people could use the > >> Dstream with however they were already using Kafka at work. The lack > >> of explicit agreement on that goal led to all kinds of fighting with > >> committers, that could have been avoided. The lack of explicit > >> up-front strategy discussion led to the DStream not really working > >> with compacted topics. I knew about compacted topics, but don't have > >> a use for them, so had a blind spot there. If there was explicit > >> up-front discussion that my strategy was "assume that batches can be > >> defined on the driver solely by beginning and ending offsets", there's > >> a greater chance that a user would have seen that and said, "hey, what > >> about non-contiguous offsets in a compacted topic". > >> > >> This kind of thing is only going to happen smoothly if we have a > >> lightweight user-visible process with clear outcomes. > >> > >> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson > >> <assaf.mendel...@rsa.com> wrote: > >> > I agree with most of what Cody said. > >> > > >> > Two things: > >> > > >> > First we can always have other people suggest SIPs but mark them as > >> > “unreviewed” and have committers basically move them forward. The > >> > problem is > >> > that writing a good document takes time. This way we can leverage non > >> > committers to do some of this work (it is just another way to > >> > contribute). > >> > > >> > > >> > > >> > As for strategy, in many cases implementation strategy can affect the > >> > goals. > >> > I will give a small example: In the current structured streaming > >> > strategy, > >> > we group by the time to achieve a sliding window. This is definitely > an > >> > implementation decision and not a goal. However, I can think of > several > >> > aggregation functions which have the time inside their calculation > >> > buffer. > >> > For example, let’s say we want to return a set of all distinct values. > >> > One > >> > way to implement this would be to make the set into a map and have the > >> > value > >> > contain the last time seen. Multiplying it across the groupby would > cost > >> > a > >> > lot in performance. So adding such a strategy would have a great > effect > >> > on > >> > the type of aggregations and their performance which does affect the > >> > goal. > >> > Without adding the strategy, it is easy for whoever goes to the design > >> > document to not think about these cases. Furthermore, it might be > >> > decided > >> > that these cases are rare enough so that the strategy is still good > >> > enough > >> > but how would we know it without user feedback? > >> > > >> > I believe this example is exactly what Cody was talking about. Since > >> > many > >> > times implementation strategies have a large effect on the goal, we > >> > should > >> > have it discussed when discussing the goals. In addition, while it is > >> > often > >> > easy to throw out completely infeasible goals, it is often much harder > >> > to > >> > figure out that the goals are unfeasible without fine tuning. > >> > > >> > > >> > > >> > > >> > > >> > Assaf. > >> > > >> > > >> > > >> > From: Cody Koeninger-2 [via Apache Spark Developers List] > >> > [mailto:ml-node+[hidden email]] > >> > Sent: Monday, October 10, 2016 2:25 AM > >> > To: Mendelson, Assaf > >> > Subject: Re: Spark Improvement Proposals > >> > > >> > > >> > > >> > Only committers should formally submit SIPs because in an apache > >> > project only commiters have explicit political power. If a user can't > >> > find a commiter willing to sponsor an SIP idea, they have no way to > >> > get the idea passed in any case. If I can't find a committer to > >> > sponsor this meta-SIP idea, I'm out of luck. > >> > > >> > I do not believe unrealistic goals can be found solely by inspection. > >> > We've managed to ignore unrealistic goals even after implementation! > >> > Focusing on APIs can allow people to think they've solved something, > >> > when there's really no way of implementing that API while meeting the > >> > goals. Rapid iteration is clearly the best way to address this, but > >> > we've already talked about why that hasn't really worked. If adding a > >> > non-binding API section to the template is important to you, I'm not > >> > against it, but I don't think it's sufficient. > >> > > >> > On your PRD vs design doc spectrum, I'm saying this is closer to a > >> > PRD. Clear agreement on goals is the most important thing and that's > >> > why it's the thing I want binding agreement on. But I cannot agree to > >> > goals unless I have enough minimal technical info to judge whether the > >> > goals are likely to actually be accomplished. > >> > > >> > > >> > > >> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote: > >> > > >> > > >> >> Well, I think there are a few things here that don't make sense. > First, > >> >> why > >> >> should only committers submit SIPs? Development in the project should > >> >> be > >> >> open to all contributors, whether they're committers or not. Second, > I > >> >> think > >> >> unrealistic goals can be found just by inspecting the goals, and I'm > >> >> not > >> >> super worried that we'll accept a lot of SIPs that are then > infeasible > >> >> -- > >> >> we > >> >> can then submit new ones. But this depends on whether you want this > >> >> process > >> >> to be a "design doc lite", where people also agree on implementation > >> >> strategy, or just a way to agree on goals. This is what I asked > earlier > >> >> about PRDs vs design docs (and I'm open to either one but I'd just > like > >> >> clarity). Finally, both as a user and designer of software, I always > >> >> want > >> >> to > >> >> give feedback on APIs, so I'd really like a culture of having those > >> >> early. > >> >> People don't argue about prettiness when they discuss APIs, they > argue > >> >> about > >> >> the core concepts to expose in order to meet various goals, and then > >> >> they're > >> >> stuck maintaining those for a long time. > >> >> > >> >> Matei > >> >> > >> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote: > >> >> > >> >> Users instead of people, sure. Commiters and contributors are (or at > >> >> least > >> >> should be) a subset of users. > >> >> > >> >> Non goals, sure. I don't care what the name is, but we need to > clearly > >> >> say > >> >> e.g. 'no we are not maintaining compatibility with XYZ right now'. > >> >> > >> >> API, what I care most about is whether it allows me to accomplish the > >> >> goals. > >> >> Arguing about how ugly or pretty it is can be saved for design/ > >> >> implementation imho. > >> >> > >> >> Strategy, this is necessary because otherwise goals can be out of > line > >> >> with > >> >> reality. Don't propose goals you don't have at least some idea of > how > >> >> to > >> >> implement. > >> >> > >> >> Rejected strategies, given that commiters are the only ones I'm > saying > >> >> should formally submit SPARKLIs or SIPs, if they put junk in a > required > >> >> section then slap them down for it and tell them to fix it. > >> >> > >> >> > >> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote: > >> >>> > >> >>> Yup, this is the stuff that I found unclear. Thanks for clarifying > >> >>> here, > >> >>> but we should also clarify it in the writeup. In particular: > >> >>> > >> >>> - Goals needs to be about user-facing behavior ("people" is broad) > >> >>> > >> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig > >> >>> up > >> >>> one of these and say "Spark's developers have officially rejected X, > >> >>> which > >> >>> our awesome system has". > >> >>> > >> >>> - For user-facing stuff, I think you need a section on API. > Virtually > >> >>> all > >> >>> other *IPs I've seen have that. > >> >>> > >> >>> - I'm still not sure why the strategy section is needed if the > purpose > >> >>> is > >> >>> to define user-facing behavior -- unless this is the strategy for > >> >>> setting > >> >>> the goals or for defining the API. That sounds squarely like a > design > >> >>> doc > >> >>> issue. In some sense, who cares whether the proposal is technically > >> >>> feasible > >> >>> right now? If it's infeasible, that will be discovered later during > >> >>> design > >> >>> and implementation. Same thing with rejected strategies -- listing > >> >>> some > >> >>> of > >> >>> those is definitely useful sometimes, but if you make this a > >> >>> *required* > >> >>> section, people are just going to fill it in with bogus stuff (I've > >> >>> seen > >> >>> this happen before). > >> >>> > >> >>> Matei > >> >>> > >> > > >> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote: > >> >>> > > >> >>> > So to focus the discussion on the specific strategy I'm > suggesting, > >> >>> > documented at > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark- > improvement-proposals.md > >> >>> > > >> >>> > "Goals: What must this allow people to do, that they can't > >> >>> > currently?" > >> >>> > > >> >>> > Is it unclear that this is focusing specifically on people-visible > >> >>> > behavior? > >> >>> > > >> >>> > Rejected goals - are important because otherwise people keep > trying > >> >>> > to argue about scope. Of course you can change things later with > a > >> >>> > different SIP and different vote, the point is to focus. > >> >>> > > >> >>> > Use cases - are something that people are going to bring up in > >> >>> > discussion. If they aren't clearly documented as a goal ("This > must > >> >>> > allow me to connect using SSL"), they should be added. > >> >>> > > >> >>> > Internal architecture - if the people who need specific behavior > are > >> >>> > implementers of other parts of the system, that's fine. > >> >>> > > >> >>> > Rejected strategies - If you have none of these, you have no > >> >>> > evidence > >> >>> > that the proponent didn't just go with the first thing they had in > >> >>> > mind (or have already implemented), which is a big problem > >> >>> > currently. > >> >>> > Approval isn't binding as to specifics of implementation, so these > >> >>> > aren't handcuffs. The goals are the contract, the strategy is > >> >>> > evidence that contract can actually be met. > >> >>> > > >> >>> > Design docs - I'm not touching design docs. The markdown file I > >> >>> > linked specifically says of the strategy section "This is not a > full > >> >>> > design document." Is this unclear? Design docs can be worked on > >> >>> > obviously, but that's not what I'm concerned with here. > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> > >> >>> > wrote: > >> >>> >> Hi Cody, > >> >>> >> > >> >>> >> I think this would be a lot more concrete if we had a more > detailed > >> >>> >> template > >> >>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. > >> >>> >> are > >> >>> >> they > >> >>> >> a way to solicit feedback on the user-facing behavior or on the > >> >>> >> internals? > >> >>> >> "Goals" can cover both things. I've been thinking of SIPs more as > >> >>> >> Product > >> >>> >> Requirements Docs (PRDs), which focus on *what* a code change > >> >>> >> should > >> >>> >> do > >> >>> >> as > >> >>> >> opposed to how. > >> >>> >> > >> >>> >> In particular, here are some things that you may or may not > >> >>> >> consider > >> >>> >> in > >> >>> >> scope for SIPs: > >> >>> >> > >> >>> >> - Goals and non-goals: This is definitely in scope, and IMO > should > >> >>> >> focus on > >> >>> >> user-visible behavior (e.g. "system supports SQL window > functions" > >> >>> >> or > >> >>> >> "system continues working if one node fails"). BTW I wouldn't say > >> >>> >> "rejected > >> >>> >> goals" because some of them might become goals later, so we're > not > >> >>> >> definitively rejecting them. > >> >>> >> > >> >>> >> - Public API: Probably should be included in most SIPs unless > it's > >> >>> >> too > >> >>> >> large > >> >>> >> to fully specify then (e.g. "let's add an ML library"). > >> >>> >> > >> >>> >> - Use cases: I usually find this very useful in PRDs to better > >> >>> >> communicate > >> >>> >> the goals. > >> >>> >> > >> >>> >> - Internal architecture: This is usually *not* a thing users can > >> >>> >> easily > >> >>> >> comment on and it sounds more like a design doc item. Of course > >> >>> >> it's > >> >>> >> important to show that the SIP is feasible to implement. One > >> >>> >> exception, > >> >>> >> however, is that I think we'll have some SIPs primarily on > >> >>> >> internals > >> >>> >> (e.g. > >> >>> >> if somebody wants to refactor Spark's query optimizer or > >> >>> >> something). > >> >>> >> > >> >>> >> - Rejected strategies: I personally wouldn't put this, because > >> >>> >> what's > >> >>> >> the > >> >>> >> point of voting to reject a strategy before you've really begun > >> >>> >> designing > >> >>> >> and implementing something? What if you discover that the > strategy > >> >>> >> is > >> >>> >> actually better when you start doing stuff? > >> >>> >> > >> >>> >> At a super high level, it depends on whether you want the SIPs to > >> >>> >> be > >> >>> >> PRDs > >> >>> >> for getting some quick feedback on the goals of a feature before > it > >> >>> >> is > >> >>> >> designed, or something more like full-fledged design docs (just a > >> >>> >> more > >> >>> >> visible design doc for bigger changes). I looked at Kafka's KIPs, > >> >>> >> and > >> >>> >> they > >> >>> >> actually seem to be more like design docs. This can work too but > it > >> >>> >> does > >> >>> >> require more work from the proposer and it can lead to the same > >> >>> >> problems you > >> >>> >> mentioned with people already having a design and implementation > in > >> >>> >> mind. > >> >>> >> > >> >>> >> Basically, the question is, are you trying to iterate faster on > >> >>> >> design > >> >>> >> by > >> >>> >> adding a step for user feedback earlier? Or are you just trying > to > >> >>> >> make > >> >>> >> design docs for key features more visible (and their approval > more > >> >>> >> formal)? > >> >>> >> > >> >>> >> BTW note that in either case, I'd like to have a template for > >> >>> >> design > >> >>> >> docs > >> >>> >> too, which should also include goals. I think that would've > avoided > >> >>> >> some of > >> >>> >> the issues you brought up. > >> >>> >> > >> >>> >> Matei > >> >>> >> > >> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> > wrote: > >> >>> >> > >> >>> >> Here's my specific proposal (meta-proposal?) > >> >>> >> > >> >>> >> Spark Improvement Proposals (SIP) > >> >>> >> > >> >>> >> > >> >>> >> Background: > >> >>> >> > >> >>> >> The current problem is that design and implementation of large > >> >>> >> features > >> >>> >> are > >> >>> >> often done in private, before soliciting user feedback. > >> >>> >> > >> >>> >> When feedback is solicited, it is often as to detailed design > >> >>> >> specifics, not > >> >>> >> focused on goals. > >> >>> >> > >> >>> >> When implementation does take place after design, there is often > >> >>> >> disagreement as to what goals are or are not in scope. > >> >>> >> > >> >>> >> This results in commits that don't fully meet user needs. > >> >>> >> > >> >>> >> > >> >>> >> Goals: > >> >>> >> > >> >>> >> - Ensure user, contributor, and committer goals are clearly > >> >>> >> identified > >> >>> >> and > >> >>> >> agreed upon, before implementation takes place. > >> >>> >> > >> >>> >> - Ensure that a technically feasible strategy is chosen that is > >> >>> >> likely > >> >>> >> to > >> >>> >> meet the goals. > >> >>> >> > >> >>> >> > >> >>> >> Rejected Goals: > >> >>> >> > >> >>> >> - SIPs are not for detailed design. Design by committee doesn't > >> >>> >> work. > >> >>> >> > >> >>> >> - SIPs are not for every change. We dont need that much process. > >> >>> >> > >> >>> >> > >> >>> >> Strategy: > >> >>> >> > >> >>> >> My suggestion is outlined as a Spark Improvement Proposal process > >> >>> >> documented > >> >>> >> at > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark- > improvement-proposals.md > >> >>> >> > >> >>> >> Specifics of Jira manipulation are an implementation detail we > can > >> >>> >> figure > >> >>> >> out. > >> >>> >> > >> >>> >> I'm suggesting voting; the need here is for a _clear_ outcome. > >> >>> >> > >> >>> >> > >> >>> >> Rejected Strategies: > >> >>> >> > >> >>> >> Having someone who understands the problem implement it first > >> >>> >> works, > >> >>> >> but > >> >>> >> only if significant iteration after user feedback is allowed. > >> >>> >> > >> >>> >> Historically this has been problematic due to pressure to limit > >> >>> >> public > >> >>> >> api > >> >>> >> changes. > >> >>> >> > >> >>> >> > >> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> > >> >>> >> wrote: > >> >>> >>> > >> >>> >>> Alright looks like there are quite a bit of support. We should > >> >>> >>> wait > >> >>> >>> to > >> >>> >>> hear from more people too. > >> >>> >>> > >> >>> >>> To push this forward, Cody and I will be working together in the > >> >>> >>> next > >> >>> >>> couple of weeks to come up with a concrete, detailed proposal on > >> >>> >>> what > >> >>> >>> this > >> >>> >>> entails, and then we can discuss this the specific proposal as > >> >>> >>> well. > >> >>> >>> > >> >>> >>> > >> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> > >> >>> >>> wrote: > >> >>> >>>> > >> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for > major > >> >>> >>>> user-facing or cross-cutting changes, not minor feature adds. > >> >>> >>>> > >> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos > >> >>> >>>> <[hidden email]> wrote: > >> >>> >>>>> > >> >>> >>>>> +1 to the SIP label as long as it does not slow down things > and > >> >>> >>>>> it > >> >>> >>>>> targets optimizing efforts, coordination etc. For example > really > >> >>> >>>>> small > >> >>> >>>>> features should not need to go through this process (assuming > >> >>> >>>>> they > >> >>> >>>>> dont > >> >>> >>>>> touch public interfaces) or re-factorings and hope it will be > >> >>> >>>>> kept > >> >>> >>>>> this > >> >>> >>>>> way. So as a guideline doc should be provided, like in the KIP > >> >>> >>>>> case. > >> >>> >>>>> > >> >>> >>>>> IMHO so far aside from tagging things and linking them > elsewhere > >> >>> >>>>> simply > >> >>> >>>>> having design docs and prototypes implementations in PRs is > not > >> >>> >>>>> something > >> >>> >>>>> that has not worked so far. What is really a pain in many > >> >>> >>>>> projects > >> >>> >>>>> out there > >> >>> >>>>> is discontinuity in progress of PRs, missing features, slow > >> >>> >>>>> reviews > >> >>> >>>>> which is > >> >>> >>>>> understandable to some extent... it is not only about Spark > but > >> >>> >>>>> things can > >> >>> >>>>> be improved for sure for this project in particular as already > >> >>> >>>>> stated. > >> >>> >>>>> > >> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden > email]> > >> >>> >>>>> wrote: > >> >>> >>>>>> > >> >>> >>>>>> +1 to adding an SIP label and linking it from the website. I > >> >>> >>>>>> think > >> >>> >>>>>> it > >> >>> >>>>>> needs > >> >>> >>>>>> > >> >>> >>>>>> - template that focuses it towards soliciting user goals / > non > >> >>> >>>>>> goals > >> >>> >>>>>> - clear resolution as to which strategy was chosen to pursue. > >> >>> >>>>>> I'd > >> >>> >>>>>> recommend a vote. > >> >>> >>>>>> > >> >>> >>>>>> Matei asked me to clarify what I meant by changing > interfaces, > >> >>> >>>>>> I > >> >>> >>>>>> think > >> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, > >> >>> >>>>>> and > >> >>> >>>>>> split > >> >>> >>>>>> a thread for the other discussion per Nicholas' request. > >> >>> >>>>>> > >> >>> >>>>>> I meant changing public user interfaces. I think the first > >> >>> >>>>>> design > >> >>> >>>>>> is > >> >>> >>>>>> unlikely to be right, because it's done at a time when you > have > >> >>> >>>>>> the > >> >>> >>>>>> least information. As a user, I find it considerably more > >> >>> >>>>>> frustrating > >> >>> >>>>>> to be unable to use a tool to get my job done, than I do > having > >> >>> >>>>>> to > >> >>> >>>>>> make minor changes to my code in order to take advantage of > >> >>> >>>>>> features. > >> >>> >>>>>> I've seen committers be seriously reluctant to allow changes > to > >> >>> >>>>>> @experimental code that are needed in order for it to really > >> >>> >>>>>> work > >> >>> >>>>>> right. You need to be able to iterate, and if people on both > >> >>> >>>>>> sides > >> >>> >>>>>> of > >> >>> >>>>>> the fence aren't going to respect that some newer apis are > >> >>> >>>>>> subject > >> >>> >>>>>> to > >> >>> >>>>>> change, then why even mark them as such? > >> >>> >>>>>> > >> >>> >>>>>> Ideally a finished SIP should give me a checklist of things > >> >>> >>>>>> that > >> >>> >>>>>> an > >> >>> >>>>>> implementation must do, and things that it doesn't need to > do. > >> >>> >>>>>> Contributors/committers should be seriously discouraged from > >> >>> >>>>>> putting > >> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype > >> >>> >>>>>> implementation of all those things, especially if they're > then > >> >>> >>>>>> going > >> >>> >>>>>> to argue against interface changes necessary to get the the > >> >>> >>>>>> rest > >> >>> >>>>>> of > >> >>> >>>>>> the things done in the 0.2 version. > >> >>> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]> > >> >>> >>>>>> wrote: > >> >>> >>>>>>> I like the lightweight proposal to add a SIP label. > >> >>> >>>>>>> > >> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested > >> >>> >>>>>>> using > >> >>> >>>>>>> wiki > >> >>> >>>>>>> to > >> >>> >>>>>>> track the list of major changes, but that never really > >> >>> >>>>>>> materialized > >> >>> >>>>>>> due to > >> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then > link > >> >>> >>>>>>> to > >> >>> >>>>>>> them > >> >>> >>>>>>> prominently on the Spark website makes a lot of sense. > >> >>> >>>>>>> > >> >>> >>>>>>> > >> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia > >> >>> >>>>>>> <[hidden email]> > >> >>> >>>>>>> wrote: > >> >>> >>>>>>>> > >> >>> >>>>>>>> For the improvement proposals, I think one major point was > to > >> >>> >>>>>>>> make > >> >>> >>>>>>>> them > >> >>> >>>>>>>> really visible to users who are not contributors, so we > >> >>> >>>>>>>> should > >> >>> >>>>>>>> do > >> >>> >>>>>>>> more than > >> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to > have a > >> >>> >>>>>>>> new > >> >>> >>>>>>>> type of > >> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows > all > >> >>> >>>>>>>> such > >> >>> >>>>>>>> JIRAs from > >> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and > >> >>> >>>>>>>> design > >> >>> >>>>>>>> doc > >> >>> >>>>>>>> templates (in fact many projects have them). > >> >>> >>>>>>>> > >> >>> >>>>>>>> Matei > >> >>> >>>>>>>> > >> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> > >> >>> >>>>>>>> wrote: > >> >>> >>>>>>>> > >> >>> >>>>>>>> I called Cody last night and talked about some of the > topics > >> >>> >>>>>>>> in > >> >>> >>>>>>>> his > >> >>> >>>>>>>> email. > >> >>> >>>>>>>> It became clear to me Cody genuinely cares about the > project. > >> >>> >>>>>>>> > >> >>> >>>>>>>> Some of the frustrations come from the success of the > project > >> >>> >>>>>>>> itself > >> >>> >>>>>>>> becoming very "hot", and it is difficult to get clarity > from > >> >>> >>>>>>>> people > >> >>> >>>>>>>> who > >> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in > >> >>> >>>>>>>> some > >> >>> >>>>>>>> ways > >> >>> >>>>>>>> similar > >> >>> >>>>>>>> to scaling an engineering team in a successful startup: old > >> >>> >>>>>>>> processes that > >> >>> >>>>>>>> worked well might not work so well when it gets to a > certain > >> >>> >>>>>>>> size, > >> >>> >>>>>>>> cultures > >> >>> >>>>>>>> can get diluted, building culture vs building process, etc. > >> >>> >>>>>>>> > >> >>> >>>>>>>> I also really like to have a more visible process for > larger > >> >>> >>>>>>>> changes, > >> >>> >>>>>>>> especially major user facing API changes. Historically we > >> >>> >>>>>>>> upload > >> >>> >>>>>>>> design docs > >> >>> >>>>>>>> for major changes, but it is not always consistent and > >> >>> >>>>>>>> difficult > >> >>> >>>>>>>> to > >> >>> >>>>>>>> quality > >> >>> >>>>>>>> of the docs, due to the volunteering nature of the > >> >>> >>>>>>>> organization. > >> >>> >>>>>>>> > >> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on > >> >>> >>>>>>>> building a > >> >>> >>>>>>>> culture > >> >>> >>>>>>>> to improve clarity: > >> >>> >>>>>>>> > >> >>> >>>>>>>> - Process: Large changes should have design docs posted on > >> >>> >>>>>>>> JIRA. > >> >>> >>>>>>>> One > >> >>> >>>>>>>> thing > >> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me > is > >> >>> >>>>>>>> we > >> >>> >>>>>>>> should > >> >>> >>>>>>>> create a design doc template for the project and ask > >> >>> >>>>>>>> everybody > >> >>> >>>>>>>> to > >> >>> >>>>>>>> follow. > >> >>> >>>>>>>> The design doc template should also explicitly list goals > and > >> >>> >>>>>>>> non-goals, to > >> >>> >>>>>>>> make design doc more consistent. > >> >>> >>>>>>>> > >> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some > this > >> >>> >>>>>>>> with > >> >>> >>>>>>>> some > >> >>> >>>>>>>> changes, but again very inconsistent. Just posting > something > >> >>> >>>>>>>> on > >> >>> >>>>>>>> JIRA > >> >>> >>>>>>>> isn't > >> >>> >>>>>>>> sufficient, because there are simply too many JIRAs and the > >> >>> >>>>>>>> signal > >> >>> >>>>>>>> get lost > >> >>> >>>>>>>> in the noise. While this is generally impossible to enforce > >> >>> >>>>>>>> because > >> >>> >>>>>>>> we can't > >> >>> >>>>>>>> force all volunteers to conform to a process (or they might > >> >>> >>>>>>>> not > >> >>> >>>>>>>> even > >> >>> >>>>>>>> be > >> >>> >>>>>>>> aware of this), those who are more familiar with the > project > >> >>> >>>>>>>> can > >> >>> >>>>>>>> help by > >> >>> >>>>>>>> emailing the dev@ when they see something that hasn't > been. > >> >>> >>>>>>>> > >> >>> >>>>>>>> - Culture: The design doc author(s) should be open to > >> >>> >>>>>>>> feedback. > >> >>> >>>>>>>> A > >> >>> >>>>>>>> design > >> >>> >>>>>>>> doc should serve as the base for discussion and is by no > >> >>> >>>>>>>> means > >> >>> >>>>>>>> the > >> >>> >>>>>>>> final > >> >>> >>>>>>>> design. Of course, this does not mean the author has to > >> >>> >>>>>>>> accept > >> >>> >>>>>>>> every > >> >>> >>>>>>>> feedback. They should also be comfortable accepting / > >> >>> >>>>>>>> rejecting > >> >>> >>>>>>>> ideas on > >> >>> >>>>>>>> technical grounds. > >> >>> >>>>>>>> > >> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be > >> >>> >>>>>>>> useful > >> >>> >>>>>>>> to > >> >>> >>>>>>>> have > >> >>> >>>>>>>> some monthly Google hangouts that are open to the world. I > am > >> >>> >>>>>>>> actually not > >> >>> >>>>>>>> sure how well this will work, because of the volunteering > >> >>> >>>>>>>> nature > >> >>> >>>>>>>> and > >> >>> >>>>>>>> we need > >> >>> >>>>>>>> to adjust for timezones for people across the globe, but it > >> >>> >>>>>>>> seems > >> >>> >>>>>>>> worth > >> >>> >>>>>>>> trying. > >> >>> >>>>>>>> > >> >>> >>>>>>>> - Culture: Contributors (including committers) should be > more > >> >>> >>>>>>>> direct > >> >>> >>>>>>>> in > >> >>> >>>>>>>> setting expectations, including whether they are working > on a > >> >>> >>>>>>>> specific > >> >>> >>>>>>>> issue, whether they will be working on a specific issue, > and > >> >>> >>>>>>>> whether > >> >>> >>>>>>>> an > >> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I know > in > >> >>> >>>>>>>> this > >> >>> >>>>>>>> community > >> >>> >>>>>>>> are nice and don't enjoy telling other people no, but it is > >> >>> >>>>>>>> often > >> >>> >>>>>>>> more > >> >>> >>>>>>>> annoying to a contributor to not know anything than > getting a > >> >>> >>>>>>>> no. > >> >>> >>>>>>>> > >> >>> >>>>>>>> > >> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia > >> >>> >>>>>>>> <[hidden email]> > >> >>> >>>>>>>> wrote: > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement > Proposal" > >> >>> >>>>>>>>> process that > >> >>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I > >> >>> >>>>>>>>> don't > >> >>> >>>>>>>>> think > >> >>> >>>>>>>>> committers are trying to minimize their own work -- every > >> >>> >>>>>>>>> committer > >> >>> >>>>>>>>> cares > >> >>> >>>>>>>>> about making the software useful for users. However, it is > >> >>> >>>>>>>>> always > >> >>> >>>>>>>>> hard to > >> >>> >>>>>>>>> get user input and so it helps to have this kind of > process. > >> >>> >>>>>>>>> I've > >> >>> >>>>>>>>> certainly > >> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just to > see > >> >>> >>>>>>>>> the > >> >>> >>>>>>>>> biggest > >> >>> >>>>>>>>> things on the roadmap. > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> When you're talking about "changing interfaces", are you > >> >>> >>>>>>>>> talking > >> >>> >>>>>>>>> about > >> >>> >>>>>>>>> public or internal APIs? I do think many people hate > >> >>> >>>>>>>>> changing > >> >>> >>>>>>>>> public APIs > >> >>> >>>>>>>>> and I actually think that's for the best of the project. > >> >>> >>>>>>>>> That's > >> >>> >>>>>>>>> a > >> >>> >>>>>>>>> technical > >> >>> >>>>>>>>> debate, but basically, the worst thing when you're using a > >> >>> >>>>>>>>> piece > >> >>> >>>>>>>>> of > >> >>> >>>>>>>>> software > >> >>> >>>>>>>>> is that the developers constantly ask you to rewrite your > >> >>> >>>>>>>>> app > >> >>> >>>>>>>>> to > >> >>> >>>>>>>>> update to a > >> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue > >> >>> >>>>>>>>> anyone > >> >>> >>>>>>>>> who's used > >> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change > their > >> >>> >>>>>>>>> code > >> >>> >>>>>>>>> this > >> >>> >>>>>>>>> release" model works well within a single large company, > but > >> >>> >>>>>>>>> doesn't work > >> >>> >>>>>>>>> well for a community, which is why nearly all *very* > widely > >> >>> >>>>>>>>> used > >> >>> >>>>>>>>> programming > >> >>> >>>>>>>>> interfaces (I'm talking things like Java standard library, > >> >>> >>>>>>>>> Windows > >> >>> >>>>>>>>> API, etc) > >> >>> >>>>>>>>> almost *never* break backwards compatibility. All this is > >> >>> >>>>>>>>> done > >> >>> >>>>>>>>> within reason > >> >>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, > >> >>> >>>>>>>>> 3.x, > >> >>> >>>>>>>>> etc). > >> >>> >>>>>>>> > >> >>> >>>>>>>> > >> >>> >>>>>>>> > >> >>> >>>>>>>> > >> >>> >>>>>>> > >> >>> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>> ------------------------------------------------------------ > --------- > >> >>> >>>>>> To unsubscribe e-mail: [hidden email] > >> >>> >>>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> -- > >> >>> >>>>> Stavros Kontopoulos > >> >>> >>>>> Senior Software Engineer > >> >>> >>>>> Lightbend, Inc. > >> >>> >>>>> p: +30 6977967274 > >> >>> >>>>> e: [hidden email] > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>> > >> >>> >>> > >> >>> >> > >> >>> >> > >> >>> > >> >> > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe e-mail: [hidden email] > >> > > >> > > >> > ________________________________ > >> > > >> > If you reply to this email, your message will be added to the > discussion > >> > below: > >> > > >> > > >> > http://apache-spark-developers-list.1001551.n3. > nabble.com/Spark-Improvement-Proposals-tp19268p19359.html > >> > > >> > To start a new topic under Apache Spark Developers List, email [hidden > >> > email] > >> > To unsubscribe from Apache Spark Developers List, click here. > >> > NAML > >> > > >> > > >> > ________________________________ > >> > View this message in context: RE: Spark Improvement Proposals > >> > Sent from the Apache Spark Developers List mailing list archive at > >> > Nabble.com. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >