I think this is closer to a procedural issue than a code modification issue, hence why majority. If everyone thinks consensus is better, I don't care. Again, I don't feel strongly about the way we achieve clarity, just that we achieve clarity.
On Mon, Oct 10, 2016 at 2:02 PM, Ryan Blue <rb...@netflix.com> wrote: > Sorry, I missed that the proposal includes majority approval. Why majority > instead of consensus? I think we want to build consensus around these > proposals and it makes sense to discuss until no one would veto. > > rb > > On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue <rb...@netflix.com> wrote: >> >> +1 to votes to approve proposals. I agree that proposals should have an >> official mechanism to be accepted, and a vote is an established means of >> doing that well. I like that it includes a period to review the proposal and >> I think proposals should have been discussed enough ahead of a vote to >> survive the possibility of a veto. >> >> I also like the names that are short and (mostly) unique, like SEP. >> >> Where I disagree is with the requirement that a committer must formally >> propose an enhancement. I don't see the value of restricting this: if >> someone has the will to write up a proposal then they should be encouraged >> to do so and start a discussion about it. Even if there is a political >> reality as Cody says, what is the value of codifying that in our process? I >> think restricting who can submit proposals would only undermine them by >> pushing contributors out. Maybe I'm missing something here? >> >> rb >> >> >> >> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org> >> wrote: >>> >>> Yes, users suggesting SIPs is a good thing and is explicitly called >>> out in the linked document under the Who? section. Formally proposing >>> them, not so much, because of the political realities. >>> >>> Yes, implementation strategy definitely affects goals. There are all >>> kinds of examples of this, I'll pick one that's my fault so as to >>> avoid sounding like I'm blaming: >>> >>> When I implemented the Kafka DStream, one of my (not explicitly agreed >>> upon by the community) goals was to make sure people could use the >>> Dstream with however they were already using Kafka at work. The lack >>> of explicit agreement on that goal led to all kinds of fighting with >>> committers, that could have been avoided. The lack of explicit >>> up-front strategy discussion led to the DStream not really working >>> with compacted topics. I knew about compacted topics, but don't have >>> a use for them, so had a blind spot there. If there was explicit >>> up-front discussion that my strategy was "assume that batches can be >>> defined on the driver solely by beginning and ending offsets", there's >>> a greater chance that a user would have seen that and said, "hey, what >>> about non-contiguous offsets in a compacted topic". >>> >>> This kind of thing is only going to happen smoothly if we have a >>> lightweight user-visible process with clear outcomes. >>> >>> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson >>> <assaf.mendel...@rsa.com> wrote: >>> > I agree with most of what Cody said. >>> > >>> > Two things: >>> > >>> > First we can always have other people suggest SIPs but mark them as >>> > “unreviewed” and have committers basically move them forward. The >>> > problem is >>> > that writing a good document takes time. This way we can leverage non >>> > committers to do some of this work (it is just another way to >>> > contribute). >>> > >>> > >>> > >>> > As for strategy, in many cases implementation strategy can affect the >>> > goals. >>> > I will give a small example: In the current structured streaming >>> > strategy, >>> > we group by the time to achieve a sliding window. This is definitely an >>> > implementation decision and not a goal. However, I can think of several >>> > aggregation functions which have the time inside their calculation >>> > buffer. >>> > For example, let’s say we want to return a set of all distinct values. >>> > One >>> > way to implement this would be to make the set into a map and have the >>> > value >>> > contain the last time seen. Multiplying it across the groupby would >>> > cost a >>> > lot in performance. So adding such a strategy would have a great effect >>> > on >>> > the type of aggregations and their performance which does affect the >>> > goal. >>> > Without adding the strategy, it is easy for whoever goes to the design >>> > document to not think about these cases. Furthermore, it might be >>> > decided >>> > that these cases are rare enough so that the strategy is still good >>> > enough >>> > but how would we know it without user feedback? >>> > >>> > I believe this example is exactly what Cody was talking about. Since >>> > many >>> > times implementation strategies have a large effect on the goal, we >>> > should >>> > have it discussed when discussing the goals. In addition, while it is >>> > often >>> > easy to throw out completely infeasible goals, it is often much harder >>> > to >>> > figure out that the goals are unfeasible without fine tuning. >>> > >>> > >>> > >>> > >>> > >>> > Assaf. >>> > >>> > >>> > >>> > From: Cody Koeninger-2 [via Apache Spark Developers List] >>> > [mailto:ml-node+[hidden email]] >>> > Sent: Monday, October 10, 2016 2:25 AM >>> > To: Mendelson, Assaf >>> > Subject: Re: Spark Improvement Proposals >>> > >>> > >>> > >>> > Only committers should formally submit SIPs because in an apache >>> > project only commiters have explicit political power. If a user can't >>> > find a commiter willing to sponsor an SIP idea, they have no way to >>> > get the idea passed in any case. If I can't find a committer to >>> > sponsor this meta-SIP idea, I'm out of luck. >>> > >>> > I do not believe unrealistic goals can be found solely by inspection. >>> > We've managed to ignore unrealistic goals even after implementation! >>> > Focusing on APIs can allow people to think they've solved something, >>> > when there's really no way of implementing that API while meeting the >>> > goals. Rapid iteration is clearly the best way to address this, but >>> > we've already talked about why that hasn't really worked. If adding a >>> > non-binding API section to the template is important to you, I'm not >>> > against it, but I don't think it's sufficient. >>> > >>> > On your PRD vs design doc spectrum, I'm saying this is closer to a >>> > PRD. Clear agreement on goals is the most important thing and that's >>> > why it's the thing I want binding agreement on. But I cannot agree to >>> > goals unless I have enough minimal technical info to judge whether the >>> > goals are likely to actually be accomplished. >>> > >>> > >>> > >>> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote: >>> > >>> > >>> >> Well, I think there are a few things here that don't make sense. >>> >> First, >>> >> why >>> >> should only committers submit SIPs? Development in the project should >>> >> be >>> >> open to all contributors, whether they're committers or not. Second, I >>> >> think >>> >> unrealistic goals can be found just by inspecting the goals, and I'm >>> >> not >>> >> super worried that we'll accept a lot of SIPs that are then infeasible >>> >> -- >>> >> we >>> >> can then submit new ones. But this depends on whether you want this >>> >> process >>> >> to be a "design doc lite", where people also agree on implementation >>> >> strategy, or just a way to agree on goals. This is what I asked >>> >> earlier >>> >> about PRDs vs design docs (and I'm open to either one but I'd just >>> >> like >>> >> clarity). Finally, both as a user and designer of software, I always >>> >> want >>> >> to >>> >> give feedback on APIs, so I'd really like a culture of having those >>> >> early. >>> >> People don't argue about prettiness when they discuss APIs, they argue >>> >> about >>> >> the core concepts to expose in order to meet various goals, and then >>> >> they're >>> >> stuck maintaining those for a long time. >>> >> >>> >> Matei >>> >> >>> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote: >>> >> >>> >> Users instead of people, sure. Commiters and contributors are (or at >>> >> least >>> >> should be) a subset of users. >>> >> >>> >> Non goals, sure. I don't care what the name is, but we need to clearly >>> >> say >>> >> e.g. 'no we are not maintaining compatibility with XYZ right now'. >>> >> >>> >> API, what I care most about is whether it allows me to accomplish the >>> >> goals. >>> >> Arguing about how ugly or pretty it is can be saved for design/ >>> >> implementation imho. >>> >> >>> >> Strategy, this is necessary because otherwise goals can be out of line >>> >> with >>> >> reality. Don't propose goals you don't have at least some idea of how >>> >> to >>> >> implement. >>> >> >>> >> Rejected strategies, given that commiters are the only ones I'm saying >>> >> should formally submit SPARKLIs or SIPs, if they put junk in a >>> >> required >>> >> section then slap them down for it and tell them to fix it. >>> >> >>> >> >>> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote: >>> >>> >>> >>> Yup, this is the stuff that I found unclear. Thanks for clarifying >>> >>> here, >>> >>> but we should also clarify it in the writeup. In particular: >>> >>> >>> >>> - Goals needs to be about user-facing behavior ("people" is broad) >>> >>> >>> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig >>> >>> up >>> >>> one of these and say "Spark's developers have officially rejected X, >>> >>> which >>> >>> our awesome system has". >>> >>> >>> >>> - For user-facing stuff, I think you need a section on API. Virtually >>> >>> all >>> >>> other *IPs I've seen have that. >>> >>> >>> >>> - I'm still not sure why the strategy section is needed if the >>> >>> purpose is >>> >>> to define user-facing behavior -- unless this is the strategy for >>> >>> setting >>> >>> the goals or for defining the API. That sounds squarely like a design >>> >>> doc >>> >>> issue. In some sense, who cares whether the proposal is technically >>> >>> feasible >>> >>> right now? If it's infeasible, that will be discovered later during >>> >>> design >>> >>> and implementation. Same thing with rejected strategies -- listing >>> >>> some >>> >>> of >>> >>> those is definitely useful sometimes, but if you make this a >>> >>> *required* >>> >>> section, people are just going to fill it in with bogus stuff (I've >>> >>> seen >>> >>> this happen before). >>> >>> >>> >>> Matei >>> >>> >>> > >>> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote: >>> >>> > >>> >>> > So to focus the discussion on the specific strategy I'm suggesting, >>> >>> > documented at >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >>> >>> > >>> >>> > "Goals: What must this allow people to do, that they can't >>> >>> > currently?" >>> >>> > >>> >>> > Is it unclear that this is focusing specifically on people-visible >>> >>> > behavior? >>> >>> > >>> >>> > Rejected goals - are important because otherwise people keep >>> >>> > trying >>> >>> > to argue about scope. Of course you can change things later with a >>> >>> > different SIP and different vote, the point is to focus. >>> >>> > >>> >>> > Use cases - are something that people are going to bring up in >>> >>> > discussion. If they aren't clearly documented as a goal ("This >>> >>> > must >>> >>> > allow me to connect using SSL"), they should be added. >>> >>> > >>> >>> > Internal architecture - if the people who need specific behavior >>> >>> > are >>> >>> > implementers of other parts of the system, that's fine. >>> >>> > >>> >>> > Rejected strategies - If you have none of these, you have no >>> >>> > evidence >>> >>> > that the proponent didn't just go with the first thing they had in >>> >>> > mind (or have already implemented), which is a big problem >>> >>> > currently. >>> >>> > Approval isn't binding as to specifics of implementation, so these >>> >>> > aren't handcuffs. The goals are the contract, the strategy is >>> >>> > evidence that contract can actually be met. >>> >>> > >>> >>> > Design docs - I'm not touching design docs. The markdown file I >>> >>> > linked specifically says of the strategy section "This is not a >>> >>> > full >>> >>> > design document." Is this unclear? Design docs can be worked on >>> >>> > obviously, but that's not what I'm concerned with here. >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> >>> >>> > wrote: >>> >>> >> Hi Cody, >>> >>> >> >>> >>> >> I think this would be a lot more concrete if we had a more >>> >>> >> detailed >>> >>> >> template >>> >>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. >>> >>> >> are >>> >>> >> they >>> >>> >> a way to solicit feedback on the user-facing behavior or on the >>> >>> >> internals? >>> >>> >> "Goals" can cover both things. I've been thinking of SIPs more as >>> >>> >> Product >>> >>> >> Requirements Docs (PRDs), which focus on *what* a code change >>> >>> >> should >>> >>> >> do >>> >>> >> as >>> >>> >> opposed to how. >>> >>> >> >>> >>> >> In particular, here are some things that you may or may not >>> >>> >> consider >>> >>> >> in >>> >>> >> scope for SIPs: >>> >>> >> >>> >>> >> - Goals and non-goals: This is definitely in scope, and IMO should >>> >>> >> focus on >>> >>> >> user-visible behavior (e.g. "system supports SQL window functions" >>> >>> >> or >>> >>> >> "system continues working if one node fails"). BTW I wouldn't say >>> >>> >> "rejected >>> >>> >> goals" because some of them might become goals later, so we're not >>> >>> >> definitively rejecting them. >>> >>> >> >>> >>> >> - Public API: Probably should be included in most SIPs unless it's >>> >>> >> too >>> >>> >> large >>> >>> >> to fully specify then (e.g. "let's add an ML library"). >>> >>> >> >>> >>> >> - Use cases: I usually find this very useful in PRDs to better >>> >>> >> communicate >>> >>> >> the goals. >>> >>> >> >>> >>> >> - Internal architecture: This is usually *not* a thing users can >>> >>> >> easily >>> >>> >> comment on and it sounds more like a design doc item. Of course >>> >>> >> it's >>> >>> >> important to show that the SIP is feasible to implement. One >>> >>> >> exception, >>> >>> >> however, is that I think we'll have some SIPs primarily on >>> >>> >> internals >>> >>> >> (e.g. >>> >>> >> if somebody wants to refactor Spark's query optimizer or >>> >>> >> something). >>> >>> >> >>> >>> >> - Rejected strategies: I personally wouldn't put this, because >>> >>> >> what's >>> >>> >> the >>> >>> >> point of voting to reject a strategy before you've really begun >>> >>> >> designing >>> >>> >> and implementing something? What if you discover that the strategy >>> >>> >> is >>> >>> >> actually better when you start doing stuff? >>> >>> >> >>> >>> >> At a super high level, it depends on whether you want the SIPs to >>> >>> >> be >>> >>> >> PRDs >>> >>> >> for getting some quick feedback on the goals of a feature before >>> >>> >> it is >>> >>> >> designed, or something more like full-fledged design docs (just a >>> >>> >> more >>> >>> >> visible design doc for bigger changes). I looked at Kafka's KIPs, >>> >>> >> and >>> >>> >> they >>> >>> >> actually seem to be more like design docs. This can work too but >>> >>> >> it >>> >>> >> does >>> >>> >> require more work from the proposer and it can lead to the same >>> >>> >> problems you >>> >>> >> mentioned with people already having a design and implementation >>> >>> >> in >>> >>> >> mind. >>> >>> >> >>> >>> >> Basically, the question is, are you trying to iterate faster on >>> >>> >> design >>> >>> >> by >>> >>> >> adding a step for user feedback earlier? Or are you just trying to >>> >>> >> make >>> >>> >> design docs for key features more visible (and their approval more >>> >>> >> formal)? >>> >>> >> >>> >>> >> BTW note that in either case, I'd like to have a template for >>> >>> >> design >>> >>> >> docs >>> >>> >> too, which should also include goals. I think that would've >>> >>> >> avoided >>> >>> >> some of >>> >>> >> the issues you brought up. >>> >>> >> >>> >>> >> Matei >>> >>> >> >>> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> >>> >>> >> wrote: >>> >>> >> >>> >>> >> Here's my specific proposal (meta-proposal?) >>> >>> >> >>> >>> >> Spark Improvement Proposals (SIP) >>> >>> >> >>> >>> >> >>> >>> >> Background: >>> >>> >> >>> >>> >> The current problem is that design and implementation of large >>> >>> >> features >>> >>> >> are >>> >>> >> often done in private, before soliciting user feedback. >>> >>> >> >>> >>> >> When feedback is solicited, it is often as to detailed design >>> >>> >> specifics, not >>> >>> >> focused on goals. >>> >>> >> >>> >>> >> When implementation does take place after design, there is often >>> >>> >> disagreement as to what goals are or are not in scope. >>> >>> >> >>> >>> >> This results in commits that don't fully meet user needs. >>> >>> >> >>> >>> >> >>> >>> >> Goals: >>> >>> >> >>> >>> >> - Ensure user, contributor, and committer goals are clearly >>> >>> >> identified >>> >>> >> and >>> >>> >> agreed upon, before implementation takes place. >>> >>> >> >>> >>> >> - Ensure that a technically feasible strategy is chosen that is >>> >>> >> likely >>> >>> >> to >>> >>> >> meet the goals. >>> >>> >> >>> >>> >> >>> >>> >> Rejected Goals: >>> >>> >> >>> >>> >> - SIPs are not for detailed design. Design by committee doesn't >>> >>> >> work. >>> >>> >> >>> >>> >> - SIPs are not for every change. We dont need that much process. >>> >>> >> >>> >>> >> >>> >>> >> Strategy: >>> >>> >> >>> >>> >> My suggestion is outlined as a Spark Improvement Proposal process >>> >>> >> documented >>> >>> >> at >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >>> >>> >> >>> >>> >> Specifics of Jira manipulation are an implementation detail we can >>> >>> >> figure >>> >>> >> out. >>> >>> >> >>> >>> >> I'm suggesting voting; the need here is for a _clear_ outcome. >>> >>> >> >>> >>> >> >>> >>> >> Rejected Strategies: >>> >>> >> >>> >>> >> Having someone who understands the problem implement it first >>> >>> >> works, >>> >>> >> but >>> >>> >> only if significant iteration after user feedback is allowed. >>> >>> >> >>> >>> >> Historically this has been problematic due to pressure to limit >>> >>> >> public >>> >>> >> api >>> >>> >> changes. >>> >>> >> >>> >>> >> >>> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> >>> >>> >> wrote: >>> >>> >>> >>> >>> >>> Alright looks like there are quite a bit of support. We should >>> >>> >>> wait >>> >>> >>> to >>> >>> >>> hear from more people too. >>> >>> >>> >>> >>> >>> To push this forward, Cody and I will be working together in the >>> >>> >>> next >>> >>> >>> couple of weeks to come up with a concrete, detailed proposal on >>> >>> >>> what >>> >>> >>> this >>> >>> >>> entails, and then we can discuss this the specific proposal as >>> >>> >>> well. >>> >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> >>> >>> >>> wrote: >>> >>> >>>> >>> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for >>> >>> >>>> major >>> >>> >>>> user-facing or cross-cutting changes, not minor feature adds. >>> >>> >>>> >>> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos >>> >>> >>>> <[hidden email]> wrote: >>> >>> >>>>> >>> >>> >>>>> +1 to the SIP label as long as it does not slow down things and >>> >>> >>>>> it >>> >>> >>>>> targets optimizing efforts, coordination etc. For example >>> >>> >>>>> really >>> >>> >>>>> small >>> >>> >>>>> features should not need to go through this process (assuming >>> >>> >>>>> they >>> >>> >>>>> dont >>> >>> >>>>> touch public interfaces) or re-factorings and hope it will be >>> >>> >>>>> kept >>> >>> >>>>> this >>> >>> >>>>> way. So as a guideline doc should be provided, like in the KIP >>> >>> >>>>> case. >>> >>> >>>>> >>> >>> >>>>> IMHO so far aside from tagging things and linking them >>> >>> >>>>> elsewhere >>> >>> >>>>> simply >>> >>> >>>>> having design docs and prototypes implementations in PRs is not >>> >>> >>>>> something >>> >>> >>>>> that has not worked so far. What is really a pain in many >>> >>> >>>>> projects >>> >>> >>>>> out there >>> >>> >>>>> is discontinuity in progress of PRs, missing features, slow >>> >>> >>>>> reviews >>> >>> >>>>> which is >>> >>> >>>>> understandable to some extent... it is not only about Spark but >>> >>> >>>>> things can >>> >>> >>>>> be improved for sure for this project in particular as already >>> >>> >>>>> stated. >>> >>> >>>>> >>> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden >>> >>> >>>>> email]> >>> >>> >>>>> wrote: >>> >>> >>>>>> >>> >>> >>>>>> +1 to adding an SIP label and linking it from the website. I >>> >>> >>>>>> think >>> >>> >>>>>> it >>> >>> >>>>>> needs >>> >>> >>>>>> >>> >>> >>>>>> - template that focuses it towards soliciting user goals / non >>> >>> >>>>>> goals >>> >>> >>>>>> - clear resolution as to which strategy was chosen to pursue. >>> >>> >>>>>> I'd >>> >>> >>>>>> recommend a vote. >>> >>> >>>>>> >>> >>> >>>>>> Matei asked me to clarify what I meant by changing interfaces, >>> >>> >>>>>> I >>> >>> >>>>>> think >>> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, >>> >>> >>>>>> and >>> >>> >>>>>> split >>> >>> >>>>>> a thread for the other discussion per Nicholas' request. >>> >>> >>>>>> >>> >>> >>>>>> I meant changing public user interfaces. I think the first >>> >>> >>>>>> design >>> >>> >>>>>> is >>> >>> >>>>>> unlikely to be right, because it's done at a time when you >>> >>> >>>>>> have >>> >>> >>>>>> the >>> >>> >>>>>> least information. As a user, I find it considerably more >>> >>> >>>>>> frustrating >>> >>> >>>>>> to be unable to use a tool to get my job done, than I do >>> >>> >>>>>> having to >>> >>> >>>>>> make minor changes to my code in order to take advantage of >>> >>> >>>>>> features. >>> >>> >>>>>> I've seen committers be seriously reluctant to allow changes >>> >>> >>>>>> to >>> >>> >>>>>> @experimental code that are needed in order for it to really >>> >>> >>>>>> work >>> >>> >>>>>> right. You need to be able to iterate, and if people on both >>> >>> >>>>>> sides >>> >>> >>>>>> of >>> >>> >>>>>> the fence aren't going to respect that some newer apis are >>> >>> >>>>>> subject >>> >>> >>>>>> to >>> >>> >>>>>> change, then why even mark them as such? >>> >>> >>>>>> >>> >>> >>>>>> Ideally a finished SIP should give me a checklist of things >>> >>> >>>>>> that >>> >>> >>>>>> an >>> >>> >>>>>> implementation must do, and things that it doesn't need to do. >>> >>> >>>>>> Contributors/committers should be seriously discouraged from >>> >>> >>>>>> putting >>> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype >>> >>> >>>>>> implementation of all those things, especially if they're then >>> >>> >>>>>> going >>> >>> >>>>>> to argue against interface changes necessary to get the the >>> >>> >>>>>> rest >>> >>> >>>>>> of >>> >>> >>>>>> the things done in the 0.2 version. >>> >>> >>>>>> >>> >>> >>>>>> >>> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]> >>> >>> >>>>>> wrote: >>> >>> >>>>>>> I like the lightweight proposal to add a SIP label. >>> >>> >>>>>>> >>> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested >>> >>> >>>>>>> using >>> >>> >>>>>>> wiki >>> >>> >>>>>>> to >>> >>> >>>>>>> track the list of major changes, but that never really >>> >>> >>>>>>> materialized >>> >>> >>>>>>> due to >>> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link >>> >>> >>>>>>> to >>> >>> >>>>>>> them >>> >>> >>>>>>> prominently on the Spark website makes a lot of sense. >>> >>> >>>>>>> >>> >>> >>>>>>> >>> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia >>> >>> >>>>>>> <[hidden email]> >>> >>> >>>>>>> wrote: >>> >>> >>>>>>>> >>> >>> >>>>>>>> For the improvement proposals, I think one major point was >>> >>> >>>>>>>> to >>> >>> >>>>>>>> make >>> >>> >>>>>>>> them >>> >>> >>>>>>>> really visible to users who are not contributors, so we >>> >>> >>>>>>>> should >>> >>> >>>>>>>> do >>> >>> >>>>>>>> more than >>> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have >>> >>> >>>>>>>> a >>> >>> >>>>>>>> new >>> >>> >>>>>>>> type of >>> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all >>> >>> >>>>>>>> such >>> >>> >>>>>>>> JIRAs from >>> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and >>> >>> >>>>>>>> design >>> >>> >>>>>>>> doc >>> >>> >>>>>>>> templates (in fact many projects have them). >>> >>> >>>>>>>> >>> >>> >>>>>>>> Matei >>> >>> >>>>>>>> >>> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> >>> >>> >>>>>>>> wrote: >>> >>> >>>>>>>> >>> >>> >>>>>>>> I called Cody last night and talked about some of the topics >>> >>> >>>>>>>> in >>> >>> >>>>>>>> his >>> >>> >>>>>>>> email. >>> >>> >>>>>>>> It became clear to me Cody genuinely cares about the >>> >>> >>>>>>>> project. >>> >>> >>>>>>>> >>> >>> >>>>>>>> Some of the frustrations come from the success of the >>> >>> >>>>>>>> project >>> >>> >>>>>>>> itself >>> >>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from >>> >>> >>>>>>>> people >>> >>> >>>>>>>> who >>> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in >>> >>> >>>>>>>> some >>> >>> >>>>>>>> ways >>> >>> >>>>>>>> similar >>> >>> >>>>>>>> to scaling an engineering team in a successful startup: old >>> >>> >>>>>>>> processes that >>> >>> >>>>>>>> worked well might not work so well when it gets to a certain >>> >>> >>>>>>>> size, >>> >>> >>>>>>>> cultures >>> >>> >>>>>>>> can get diluted, building culture vs building process, etc. >>> >>> >>>>>>>> >>> >>> >>>>>>>> I also really like to have a more visible process for larger >>> >>> >>>>>>>> changes, >>> >>> >>>>>>>> especially major user facing API changes. Historically we >>> >>> >>>>>>>> upload >>> >>> >>>>>>>> design docs >>> >>> >>>>>>>> for major changes, but it is not always consistent and >>> >>> >>>>>>>> difficult >>> >>> >>>>>>>> to >>> >>> >>>>>>>> quality >>> >>> >>>>>>>> of the docs, due to the volunteering nature of the >>> >>> >>>>>>>> organization. >>> >>> >>>>>>>> >>> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on >>> >>> >>>>>>>> building a >>> >>> >>>>>>>> culture >>> >>> >>>>>>>> to improve clarity: >>> >>> >>>>>>>> >>> >>> >>>>>>>> - Process: Large changes should have design docs posted on >>> >>> >>>>>>>> JIRA. >>> >>> >>>>>>>> One >>> >>> >>>>>>>> thing >>> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me >>> >>> >>>>>>>> is we >>> >>> >>>>>>>> should >>> >>> >>>>>>>> create a design doc template for the project and ask >>> >>> >>>>>>>> everybody >>> >>> >>>>>>>> to >>> >>> >>>>>>>> follow. >>> >>> >>>>>>>> The design doc template should also explicitly list goals >>> >>> >>>>>>>> and >>> >>> >>>>>>>> non-goals, to >>> >>> >>>>>>>> make design doc more consistent. >>> >>> >>>>>>>> >>> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this >>> >>> >>>>>>>> with >>> >>> >>>>>>>> some >>> >>> >>>>>>>> changes, but again very inconsistent. Just posting something >>> >>> >>>>>>>> on >>> >>> >>>>>>>> JIRA >>> >>> >>>>>>>> isn't >>> >>> >>>>>>>> sufficient, because there are simply too many JIRAs and the >>> >>> >>>>>>>> signal >>> >>> >>>>>>>> get lost >>> >>> >>>>>>>> in the noise. While this is generally impossible to enforce >>> >>> >>>>>>>> because >>> >>> >>>>>>>> we can't >>> >>> >>>>>>>> force all volunteers to conform to a process (or they might >>> >>> >>>>>>>> not >>> >>> >>>>>>>> even >>> >>> >>>>>>>> be >>> >>> >>>>>>>> aware of this), those who are more familiar with the >>> >>> >>>>>>>> project >>> >>> >>>>>>>> can >>> >>> >>>>>>>> help by >>> >>> >>>>>>>> emailing the dev@ when they see something that hasn't been. >>> >>> >>>>>>>> >>> >>> >>>>>>>> - Culture: The design doc author(s) should be open to >>> >>> >>>>>>>> feedback. >>> >>> >>>>>>>> A >>> >>> >>>>>>>> design >>> >>> >>>>>>>> doc should serve as the base for discussion and is by no >>> >>> >>>>>>>> means >>> >>> >>>>>>>> the >>> >>> >>>>>>>> final >>> >>> >>>>>>>> design. Of course, this does not mean the author has to >>> >>> >>>>>>>> accept >>> >>> >>>>>>>> every >>> >>> >>>>>>>> feedback. They should also be comfortable accepting / >>> >>> >>>>>>>> rejecting >>> >>> >>>>>>>> ideas on >>> >>> >>>>>>>> technical grounds. >>> >>> >>>>>>>> >>> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be >>> >>> >>>>>>>> useful >>> >>> >>>>>>>> to >>> >>> >>>>>>>> have >>> >>> >>>>>>>> some monthly Google hangouts that are open to the world. I >>> >>> >>>>>>>> am >>> >>> >>>>>>>> actually not >>> >>> >>>>>>>> sure how well this will work, because of the volunteering >>> >>> >>>>>>>> nature >>> >>> >>>>>>>> and >>> >>> >>>>>>>> we need >>> >>> >>>>>>>> to adjust for timezones for people across the globe, but it >>> >>> >>>>>>>> seems >>> >>> >>>>>>>> worth >>> >>> >>>>>>>> trying. >>> >>> >>>>>>>> >>> >>> >>>>>>>> - Culture: Contributors (including committers) should be >>> >>> >>>>>>>> more >>> >>> >>>>>>>> direct >>> >>> >>>>>>>> in >>> >>> >>>>>>>> setting expectations, including whether they are working on >>> >>> >>>>>>>> a >>> >>> >>>>>>>> specific >>> >>> >>>>>>>> issue, whether they will be working on a specific issue, and >>> >>> >>>>>>>> whether >>> >>> >>>>>>>> an >>> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I know >>> >>> >>>>>>>> in >>> >>> >>>>>>>> this >>> >>> >>>>>>>> community >>> >>> >>>>>>>> are nice and don't enjoy telling other people no, but it is >>> >>> >>>>>>>> often >>> >>> >>>>>>>> more >>> >>> >>>>>>>> annoying to a contributor to not know anything than getting >>> >>> >>>>>>>> a >>> >>> >>>>>>>> no. >>> >>> >>>>>>>> >>> >>> >>>>>>>> >>> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia >>> >>> >>>>>>>> <[hidden email]> >>> >>> >>>>>>>> wrote: >>> >>> >>>>>>>>> >>> >>> >>>>>>>>> >>> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement >>> >>> >>>>>>>>> Proposal" >>> >>> >>>>>>>>> process that >>> >>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I >>> >>> >>>>>>>>> don't >>> >>> >>>>>>>>> think >>> >>> >>>>>>>>> committers are trying to minimize their own work -- every >>> >>> >>>>>>>>> committer >>> >>> >>>>>>>>> cares >>> >>> >>>>>>>>> about making the software useful for users. However, it is >>> >>> >>>>>>>>> always >>> >>> >>>>>>>>> hard to >>> >>> >>>>>>>>> get user input and so it helps to have this kind of >>> >>> >>>>>>>>> process. >>> >>> >>>>>>>>> I've >>> >>> >>>>>>>>> certainly >>> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just to >>> >>> >>>>>>>>> see >>> >>> >>>>>>>>> the >>> >>> >>>>>>>>> biggest >>> >>> >>>>>>>>> things on the roadmap. >>> >>> >>>>>>>>> >>> >>> >>>>>>>>> When you're talking about "changing interfaces", are you >>> >>> >>>>>>>>> talking >>> >>> >>>>>>>>> about >>> >>> >>>>>>>>> public or internal APIs? I do think many people hate >>> >>> >>>>>>>>> changing >>> >>> >>>>>>>>> public APIs >>> >>> >>>>>>>>> and I actually think that's for the best of the project. >>> >>> >>>>>>>>> That's >>> >>> >>>>>>>>> a >>> >>> >>>>>>>>> technical >>> >>> >>>>>>>>> debate, but basically, the worst thing when you're using a >>> >>> >>>>>>>>> piece >>> >>> >>>>>>>>> of >>> >>> >>>>>>>>> software >>> >>> >>>>>>>>> is that the developers constantly ask you to rewrite your >>> >>> >>>>>>>>> app >>> >>> >>>>>>>>> to >>> >>> >>>>>>>>> update to a >>> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue >>> >>> >>>>>>>>> anyone >>> >>> >>>>>>>>> who's used >>> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their >>> >>> >>>>>>>>> code >>> >>> >>>>>>>>> this >>> >>> >>>>>>>>> release" model works well within a single large company, >>> >>> >>>>>>>>> but >>> >>> >>>>>>>>> doesn't work >>> >>> >>>>>>>>> well for a community, which is why nearly all *very* widely >>> >>> >>>>>>>>> used >>> >>> >>>>>>>>> programming >>> >>> >>>>>>>>> interfaces (I'm talking things like Java standard library, >>> >>> >>>>>>>>> Windows >>> >>> >>>>>>>>> API, etc) >>> >>> >>>>>>>>> almost *never* break backwards compatibility. All this is >>> >>> >>>>>>>>> done >>> >>> >>>>>>>>> within reason >>> >>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, >>> >>> >>>>>>>>> 3.x, >>> >>> >>>>>>>>> etc). >>> >>> >>>>>>>> >>> >>> >>>>>>>> >>> >>> >>>>>>>> >>> >>> >>>>>>>> >>> >>> >>>>>>> >>> >>> >>>>>> >>> >>> >>>>>> >>> >>> >>>>>> >>> >>> >>>>>> >>> >>> >>>>>> --------------------------------------------------------------------- >>> >>> >>>>>> To unsubscribe e-mail: [hidden email] >>> >>> >>>>>> >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> -- >>> >>> >>>>> Stavros Kontopoulos >>> >>> >>>>> Senior Software Engineer >>> >>> >>>>> Lightbend, Inc. >>> >>> >>>>> p: +30 6977967274 >>> >>> >>>>> e: [hidden email] >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>> >>> >>> >>> >>> >>> >> >>> >>> >> >>> >>> >>> >> >>> > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe e-mail: [hidden email] >>> > >>> > >>> > ________________________________ >>> > >>> > If you reply to this email, your message will be added to the >>> > discussion >>> > below: >>> > >>> > >>> > http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html >>> > >>> > To start a new topic under Apache Spark Developers List, email [hidden >>> > email] >>> > To unsubscribe from Apache Spark Developers List, click here. >>> > NAML >>> > >>> > >>> > ________________________________ >>> > View this message in context: RE: Spark Improvement Proposals >>> > Sent from the Apache Spark Developers List mailing list archive at >>> > Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix > > > > > -- > Ryan Blue > Software Engineer > Netflix --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org