Some of you guys may have already seen this but in case if you haven't you may want to check it out.
http://www.slideshare.net/sbaltagi/flink-vs-spark On Tue, Oct 11, 2016 at 1:57 PM, Ryan Blue <rb...@netflix.com.invalid> wrote: > I don't think we will have trouble with whatever rule that is adopted for > accepting proposals. Considering committers' votes binding (if that is what > we choose) is an established practice as long as it isn't for specific > votes, like a release vote. From the Apache docs: "Who is permitted to vote > is, to some extent, a community-specific thing." [1] And, I also don't see > why it would be a problem to choose consensus, as long as we have an open > discussion and vote about these rules. > > rb > > On Mon, Oct 10, 2016 at 4:15 PM, Cody Koeninger <c...@koeninger.org> > wrote: > >> If someone wants to tell me that it's OK and "The Apache Way" for >> Kafka and Flink to have a proposal process that ends in a lazy >> majority, but it's not OK for Spark to have a proposal process that >> ends in a non-lazy consensus... >> >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+ >> Improvement+Proposals#KafkaImprovementProposals-Process >> >> In practice any PMC member can stop a proposal they don't like, so I'm >> not sure how much it matters. >> >> >> >> On Mon, Oct 10, 2016 at 5:59 PM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> > There is a larger issue to keep in mind, and that is that what you are >> > proposing is a procedure that, as far as I am aware, hasn't previously >> been >> > adopted in an Apache project, and thus is not an easy or exact fit with >> > established practices that have been blessed as "The Apache Way". As >> such, >> > we need to be careful, because we have run into some trouble in the past >> > with some inside the ASF but essentially outside the Spark community who >> > didn't like the way we were doing things. >> > >> > On Mon, Oct 10, 2016 at 3:53 PM, Cody Koeninger <c...@koeninger.org> >> wrote: >> >> >> >> Apache documents say lots of confusing stuff, including that commiters >> are >> >> in practice given a vote. >> >> >> >> https://www.apache.org/foundation/voting.html >> >> >> >> I don't care either way, if someone wants me to sub commiter for PMC in >> >> the voting section, fine, we just need a clear outcome. >> >> >> >> >> >> On Oct 10, 2016 17:36, "Mark Hamstra" <m...@clearstorydata.com> wrote: >> >>> >> >>> If I'm correctly understanding the kind of voting that you are talking >> >>> about, then to be accurate, it is only the PMC members that have a >> vote, not >> >>> all committers: >> >>> https://www.apache.org/foundation/how-it-works.html#pmc-members >> >>> >> >>> On Mon, Oct 10, 2016 at 12:02 PM, Cody Koeninger <c...@koeninger.org> >> >>> wrote: >> >>>> >> >>>> I think the main value is in being honest about what's going on. No >> >>>> one other than committers can cast a meaningful vote, that's the >> >>>> reality. Beyond that, if people think it's more open to allow formal >> >>>> proposals from anyone, I'm not necessarily against it, but my main >> >>>> question would be this: >> >>>> >> >>>> If anyone can submit a proposal, are committers actually going to >> >>>> clearly reject and close proposals that don't meet the requirements? >> >>>> >> >>>> Right now we have a serious problem with lack of clarity regarding >> >>>> contributions, and that cannot spill over into goal-setting. >> >>>> >> >>>> On Mon, Oct 10, 2016 at 1:54 PM, Ryan Blue <rb...@netflix.com> >> wrote: >> >>>> > +1 to votes to approve proposals. I agree that proposals should >> have >> >>>> > an >> >>>> > official mechanism to be accepted, and a vote is an established >> means >> >>>> > of >> >>>> > doing that well. I like that it includes a period to review the >> >>>> > proposal and >> >>>> > I think proposals should have been discussed enough ahead of a >> vote to >> >>>> > survive the possibility of a veto. >> >>>> > >> >>>> > I also like the names that are short and (mostly) unique, like SEP. >> >>>> > >> >>>> > Where I disagree is with the requirement that a committer must >> >>>> > formally >> >>>> > propose an enhancement. I don't see the value of restricting this: >> if >> >>>> > someone has the will to write up a proposal then they should be >> >>>> > encouraged >> >>>> > to do so and start a discussion about it. Even if there is a >> political >> >>>> > reality as Cody says, what is the value of codifying that in our >> >>>> > process? I >> >>>> > think restricting who can submit proposals would only undermine >> them >> >>>> > by >> >>>> > pushing contributors out. Maybe I'm missing something here? >> >>>> > >> >>>> > rb >> >>>> > >> >>>> > >> >>>> > >> >>>> > On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger < >> c...@koeninger.org> >> >>>> > wrote: >> >>>> >> >> >>>> >> Yes, users suggesting SIPs is a good thing and is explicitly >> called >> >>>> >> out in the linked document under the Who? section. Formally >> >>>> >> proposing >> >>>> >> them, not so much, because of the political realities. >> >>>> >> >> >>>> >> Yes, implementation strategy definitely affects goals. There are >> all >> >>>> >> kinds of examples of this, I'll pick one that's my fault so as to >> >>>> >> avoid sounding like I'm blaming: >> >>>> >> >> >>>> >> When I implemented the Kafka DStream, one of my (not explicitly >> >>>> >> agreed >> >>>> >> upon by the community) goals was to make sure people could use the >> >>>> >> Dstream with however they were already using Kafka at work. The >> lack >> >>>> >> of explicit agreement on that goal led to all kinds of fighting >> with >> >>>> >> committers, that could have been avoided. The lack of explicit >> >>>> >> up-front strategy discussion led to the DStream not really working >> >>>> >> with compacted topics. I knew about compacted topics, but don't >> have >> >>>> >> a use for them, so had a blind spot there. If there was explicit >> >>>> >> up-front discussion that my strategy was "assume that batches can >> be >> >>>> >> defined on the driver solely by beginning and ending offsets", >> >>>> >> there's >> >>>> >> a greater chance that a user would have seen that and said, "hey, >> >>>> >> what >> >>>> >> about non-contiguous offsets in a compacted topic". >> >>>> >> >> >>>> >> This kind of thing is only going to happen smoothly if we have a >> >>>> >> lightweight user-visible process with clear outcomes. >> >>>> >> >> >>>> >> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson >> >>>> >> <assaf.mendel...@rsa.com> wrote: >> >>>> >> > I agree with most of what Cody said. >> >>>> >> > >> >>>> >> > Two things: >> >>>> >> > >> >>>> >> > First we can always have other people suggest SIPs but mark >> them as >> >>>> >> > “unreviewed” and have committers basically move them forward. >> The >> >>>> >> > problem is >> >>>> >> > that writing a good document takes time. This way we can >> leverage >> >>>> >> > non >> >>>> >> > committers to do some of this work (it is just another way to >> >>>> >> > contribute). >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > As for strategy, in many cases implementation strategy can >> affect >> >>>> >> > the >> >>>> >> > goals. >> >>>> >> > I will give a small example: In the current structured >> streaming >> >>>> >> > strategy, >> >>>> >> > we group by the time to achieve a sliding window. This is >> >>>> >> > definitely an >> >>>> >> > implementation decision and not a goal. However, I can think of >> >>>> >> > several >> >>>> >> > aggregation functions which have the time inside their >> calculation >> >>>> >> > buffer. >> >>>> >> > For example, let’s say we want to return a set of all distinct >> >>>> >> > values. >> >>>> >> > One >> >>>> >> > way to implement this would be to make the set into a map and >> have >> >>>> >> > the >> >>>> >> > value >> >>>> >> > contain the last time seen. Multiplying it across the groupby >> would >> >>>> >> > cost >> >>>> >> > a >> >>>> >> > lot in performance. So adding such a strategy would have a great >> >>>> >> > effect >> >>>> >> > on >> >>>> >> > the type of aggregations and their performance which does affect >> >>>> >> > the >> >>>> >> > goal. >> >>>> >> > Without adding the strategy, it is easy for whoever goes to the >> >>>> >> > design >> >>>> >> > document to not think about these cases. Furthermore, it might >> be >> >>>> >> > decided >> >>>> >> > that these cases are rare enough so that the strategy is still >> good >> >>>> >> > enough >> >>>> >> > but how would we know it without user feedback? >> >>>> >> > >> >>>> >> > I believe this example is exactly what Cody was talking about. >> >>>> >> > Since >> >>>> >> > many >> >>>> >> > times implementation strategies have a large effect on the >> goal, we >> >>>> >> > should >> >>>> >> > have it discussed when discussing the goals. In addition, while >> it >> >>>> >> > is >> >>>> >> > often >> >>>> >> > easy to throw out completely infeasible goals, it is often much >> >>>> >> > harder >> >>>> >> > to >> >>>> >> > figure out that the goals are unfeasible without fine tuning. >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > Assaf. >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > From: Cody Koeninger-2 [via Apache Spark Developers List] >> >>>> >> > [mailto:ml-node+[hidden email]] >> >>>> >> > Sent: Monday, October 10, 2016 2:25 AM >> >>>> >> > To: Mendelson, Assaf >> >>>> >> > Subject: Re: Spark Improvement Proposals >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > Only committers should formally submit SIPs because in an apache >> >>>> >> > project only commiters have explicit political power. If a user >> >>>> >> > can't >> >>>> >> > find a commiter willing to sponsor an SIP idea, they have no >> way to >> >>>> >> > get the idea passed in any case. If I can't find a committer to >> >>>> >> > sponsor this meta-SIP idea, I'm out of luck. >> >>>> >> > >> >>>> >> > I do not believe unrealistic goals can be found solely by >> >>>> >> > inspection. >> >>>> >> > We've managed to ignore unrealistic goals even after >> >>>> >> > implementation! >> >>>> >> > Focusing on APIs can allow people to think they've solved >> >>>> >> > something, >> >>>> >> > when there's really no way of implementing that API while >> meeting >> >>>> >> > the >> >>>> >> > goals. Rapid iteration is clearly the best way to address this, >> >>>> >> > but >> >>>> >> > we've already talked about why that hasn't really worked. If >> >>>> >> > adding a >> >>>> >> > non-binding API section to the template is important to you, I'm >> >>>> >> > not >> >>>> >> > against it, but I don't think it's sufficient. >> >>>> >> > >> >>>> >> > On your PRD vs design doc spectrum, I'm saying this is closer >> to a >> >>>> >> > PRD. Clear agreement on goals is the most important thing and >> >>>> >> > that's >> >>>> >> > why it's the thing I want binding agreement on. But I cannot >> agree >> >>>> >> > to >> >>>> >> > goals unless I have enough minimal technical info to judge >> whether >> >>>> >> > the >> >>>> >> > goals are likely to actually be accomplished. >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> >> >>>> >> > wrote: >> >>>> >> > >> >>>> >> > >> >>>> >> >> Well, I think there are a few things here that don't make >> sense. >> >>>> >> >> First, >> >>>> >> >> why >> >>>> >> >> should only committers submit SIPs? Development in the project >> >>>> >> >> should >> >>>> >> >> be >> >>>> >> >> open to all contributors, whether they're committers or not. >> >>>> >> >> Second, I >> >>>> >> >> think >> >>>> >> >> unrealistic goals can be found just by inspecting the goals, >> and >> >>>> >> >> I'm >> >>>> >> >> not >> >>>> >> >> super worried that we'll accept a lot of SIPs that are then >> >>>> >> >> infeasible >> >>>> >> >> -- >> >>>> >> >> we >> >>>> >> >> can then submit new ones. But this depends on whether you want >> >>>> >> >> this >> >>>> >> >> process >> >>>> >> >> to be a "design doc lite", where people also agree on >> >>>> >> >> implementation >> >>>> >> >> strategy, or just a way to agree on goals. This is what I asked >> >>>> >> >> earlier >> >>>> >> >> about PRDs vs design docs (and I'm open to either one but I'd >> just >> >>>> >> >> like >> >>>> >> >> clarity). Finally, both as a user and designer of software, I >> >>>> >> >> always >> >>>> >> >> want >> >>>> >> >> to >> >>>> >> >> give feedback on APIs, so I'd really like a culture of having >> >>>> >> >> those >> >>>> >> >> early. >> >>>> >> >> People don't argue about prettiness when they discuss APIs, >> they >> >>>> >> >> argue >> >>>> >> >> about >> >>>> >> >> the core concepts to expose in order to meet various goals, and >> >>>> >> >> then >> >>>> >> >> they're >> >>>> >> >> stuck maintaining those for a long time. >> >>>> >> >> >> >>>> >> >> Matei >> >>>> >> >> >> >>>> >> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> >> wrote: >> >>>> >> >> >> >>>> >> >> Users instead of people, sure. Commiters and contributors are >> (or >> >>>> >> >> at >> >>>> >> >> least >> >>>> >> >> should be) a subset of users. >> >>>> >> >> >> >>>> >> >> Non goals, sure. I don't care what the name is, but we need to >> >>>> >> >> clearly >> >>>> >> >> say >> >>>> >> >> e.g. 'no we are not maintaining compatibility with XYZ right >> now'. >> >>>> >> >> >> >>>> >> >> API, what I care most about is whether it allows me to >> accomplish >> >>>> >> >> the >> >>>> >> >> goals. >> >>>> >> >> Arguing about how ugly or pretty it is can be saved for design/ >> >>>> >> >> implementation imho. >> >>>> >> >> >> >>>> >> >> Strategy, this is necessary because otherwise goals can be out >> of >> >>>> >> >> line >> >>>> >> >> with >> >>>> >> >> reality. Don't propose goals you don't have at least some >> idea of >> >>>> >> >> how >> >>>> >> >> to >> >>>> >> >> implement. >> >>>> >> >> >> >>>> >> >> Rejected strategies, given that commiters are the only ones I'm >> >>>> >> >> saying >> >>>> >> >> should formally submit SPARKLIs or SIPs, if they put junk in a >> >>>> >> >> required >> >>>> >> >> section then slap them down for it and tell them to fix it. >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote: >> >>>> >> >>> >> >>>> >> >>> Yup, this is the stuff that I found unclear. Thanks for >> >>>> >> >>> clarifying >> >>>> >> >>> here, >> >>>> >> >>> but we should also clarify it in the writeup. In particular: >> >>>> >> >>> >> >>>> >> >>> - Goals needs to be about user-facing behavior ("people" is >> >>>> >> >>> broad) >> >>>> >> >>> >> >>>> >> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone >> will >> >>>> >> >>> dig >> >>>> >> >>> up >> >>>> >> >>> one of these and say "Spark's developers have officially >> rejected >> >>>> >> >>> X, >> >>>> >> >>> which >> >>>> >> >>> our awesome system has". >> >>>> >> >>> >> >>>> >> >>> - For user-facing stuff, I think you need a section on API. >> >>>> >> >>> Virtually >> >>>> >> >>> all >> >>>> >> >>> other *IPs I've seen have that. >> >>>> >> >>> >> >>>> >> >>> - I'm still not sure why the strategy section is needed if the >> >>>> >> >>> purpose >> >>>> >> >>> is >> >>>> >> >>> to define user-facing behavior -- unless this is the strategy >> for >> >>>> >> >>> setting >> >>>> >> >>> the goals or for defining the API. That sounds squarely like a >> >>>> >> >>> design >> >>>> >> >>> doc >> >>>> >> >>> issue. In some sense, who cares whether the proposal is >> >>>> >> >>> technically >> >>>> >> >>> feasible >> >>>> >> >>> right now? If it's infeasible, that will be discovered later >> >>>> >> >>> during >> >>>> >> >>> design >> >>>> >> >>> and implementation. Same thing with rejected strategies -- >> >>>> >> >>> listing >> >>>> >> >>> some >> >>>> >> >>> of >> >>>> >> >>> those is definitely useful sometimes, but if you make this a >> >>>> >> >>> *required* >> >>>> >> >>> section, people are just going to fill it in with bogus stuff >> >>>> >> >>> (I've >> >>>> >> >>> seen >> >>>> >> >>> this happen before). >> >>>> >> >>> >> >>>> >> >>> Matei >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> >> >>>> >> >>> > wrote: >> >>>> >> >>> > >> >>>> >> >>> > So to focus the discussion on the specific strategy I'm >> >>>> >> >>> > suggesting, >> >>>> >> >>> > documented at >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> >>> > https://github.com/koeninger/s >> park-1/blob/SIP-0/docs/spark-improvement-proposals.md >> >>>> >> >>> > >> >>>> >> >>> > "Goals: What must this allow people to do, that they can't >> >>>> >> >>> > currently?" >> >>>> >> >>> > >> >>>> >> >>> > Is it unclear that this is focusing specifically on >> >>>> >> >>> > people-visible >> >>>> >> >>> > behavior? >> >>>> >> >>> > >> >>>> >> >>> > Rejected goals - are important because otherwise people >> keep >> >>>> >> >>> > trying >> >>>> >> >>> > to argue about scope. Of course you can change things later >> >>>> >> >>> > with a >> >>>> >> >>> > different SIP and different vote, the point is to focus. >> >>>> >> >>> > >> >>>> >> >>> > Use cases - are something that people are going to bring up >> in >> >>>> >> >>> > discussion. If they aren't clearly documented as a goal >> ("This >> >>>> >> >>> > must >> >>>> >> >>> > allow me to connect using SSL"), they should be added. >> >>>> >> >>> > >> >>>> >> >>> > Internal architecture - if the people who need specific >> >>>> >> >>> > behavior are >> >>>> >> >>> > implementers of other parts of the system, that's fine. >> >>>> >> >>> > >> >>>> >> >>> > Rejected strategies - If you have none of these, you have no >> >>>> >> >>> > evidence >> >>>> >> >>> > that the proponent didn't just go with the first thing they >> had >> >>>> >> >>> > in >> >>>> >> >>> > mind (or have already implemented), which is a big problem >> >>>> >> >>> > currently. >> >>>> >> >>> > Approval isn't binding as to specifics of implementation, so >> >>>> >> >>> > these >> >>>> >> >>> > aren't handcuffs. The goals are the contract, the strategy >> is >> >>>> >> >>> > evidence that contract can actually be met. >> >>>> >> >>> > >> >>>> >> >>> > Design docs - I'm not touching design docs. The markdown >> file >> >>>> >> >>> > I >> >>>> >> >>> > linked specifically says of the strategy section "This is >> not a >> >>>> >> >>> > full >> >>>> >> >>> > design document." Is this unclear? Design docs can be >> worked >> >>>> >> >>> > on >> >>>> >> >>> > obviously, but that's not what I'm concerned with here. >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden >> email]> >> >>>> >> >>> > wrote: >> >>>> >> >>> >> Hi Cody, >> >>>> >> >>> >> >> >>>> >> >>> >> I think this would be a lot more concrete if we had a more >> >>>> >> >>> >> detailed >> >>>> >> >>> >> template >> >>>> >> >>> >> for SIPs. Right now, it's not super clear what's in scope >> -- >> >>>> >> >>> >> e.g. >> >>>> >> >>> >> are >> >>>> >> >>> >> they >> >>>> >> >>> >> a way to solicit feedback on the user-facing behavior or on >> >>>> >> >>> >> the >> >>>> >> >>> >> internals? >> >>>> >> >>> >> "Goals" can cover both things. I've been thinking of SIPs >> more >> >>>> >> >>> >> as >> >>>> >> >>> >> Product >> >>>> >> >>> >> Requirements Docs (PRDs), which focus on *what* a code >> change >> >>>> >> >>> >> should >> >>>> >> >>> >> do >> >>>> >> >>> >> as >> >>>> >> >>> >> opposed to how. >> >>>> >> >>> >> >> >>>> >> >>> >> In particular, here are some things that you may or may not >> >>>> >> >>> >> consider >> >>>> >> >>> >> in >> >>>> >> >>> >> scope for SIPs: >> >>>> >> >>> >> >> >>>> >> >>> >> - Goals and non-goals: This is definitely in scope, and IMO >> >>>> >> >>> >> should >> >>>> >> >>> >> focus on >> >>>> >> >>> >> user-visible behavior (e.g. "system supports SQL window >> >>>> >> >>> >> functions" >> >>>> >> >>> >> or >> >>>> >> >>> >> "system continues working if one node fails"). BTW I >> wouldn't >> >>>> >> >>> >> say >> >>>> >> >>> >> "rejected >> >>>> >> >>> >> goals" because some of them might become goals later, so >> we're >> >>>> >> >>> >> not >> >>>> >> >>> >> definitively rejecting them. >> >>>> >> >>> >> >> >>>> >> >>> >> - Public API: Probably should be included in most SIPs >> unless >> >>>> >> >>> >> it's >> >>>> >> >>> >> too >> >>>> >> >>> >> large >> >>>> >> >>> >> to fully specify then (e.g. "let's add an ML library"). >> >>>> >> >>> >> >> >>>> >> >>> >> - Use cases: I usually find this very useful in PRDs to >> better >> >>>> >> >>> >> communicate >> >>>> >> >>> >> the goals. >> >>>> >> >>> >> >> >>>> >> >>> >> - Internal architecture: This is usually *not* a thing >> users >> >>>> >> >>> >> can >> >>>> >> >>> >> easily >> >>>> >> >>> >> comment on and it sounds more like a design doc item. Of >> >>>> >> >>> >> course >> >>>> >> >>> >> it's >> >>>> >> >>> >> important to show that the SIP is feasible to implement. >> One >> >>>> >> >>> >> exception, >> >>>> >> >>> >> however, is that I think we'll have some SIPs primarily on >> >>>> >> >>> >> internals >> >>>> >> >>> >> (e.g. >> >>>> >> >>> >> if somebody wants to refactor Spark's query optimizer or >> >>>> >> >>> >> something). >> >>>> >> >>> >> >> >>>> >> >>> >> - Rejected strategies: I personally wouldn't put this, >> because >> >>>> >> >>> >> what's >> >>>> >> >>> >> the >> >>>> >> >>> >> point of voting to reject a strategy before you've really >> >>>> >> >>> >> begun >> >>>> >> >>> >> designing >> >>>> >> >>> >> and implementing something? What if you discover that the >> >>>> >> >>> >> strategy >> >>>> >> >>> >> is >> >>>> >> >>> >> actually better when you start doing stuff? >> >>>> >> >>> >> >> >>>> >> >>> >> At a super high level, it depends on whether you want the >> SIPs >> >>>> >> >>> >> to >> >>>> >> >>> >> be >> >>>> >> >>> >> PRDs >> >>>> >> >>> >> for getting some quick feedback on the goals of a feature >> >>>> >> >>> >> before it >> >>>> >> >>> >> is >> >>>> >> >>> >> designed, or something more like full-fledged design docs >> >>>> >> >>> >> (just a >> >>>> >> >>> >> more >> >>>> >> >>> >> visible design doc for bigger changes). I looked at Kafka's >> >>>> >> >>> >> KIPs, >> >>>> >> >>> >> and >> >>>> >> >>> >> they >> >>>> >> >>> >> actually seem to be more like design docs. This can work >> too >> >>>> >> >>> >> but it >> >>>> >> >>> >> does >> >>>> >> >>> >> require more work from the proposer and it can lead to the >> >>>> >> >>> >> same >> >>>> >> >>> >> problems you >> >>>> >> >>> >> mentioned with people already having a design and >> >>>> >> >>> >> implementation in >> >>>> >> >>> >> mind. >> >>>> >> >>> >> >> >>>> >> >>> >> Basically, the question is, are you trying to iterate >> faster >> >>>> >> >>> >> on >> >>>> >> >>> >> design >> >>>> >> >>> >> by >> >>>> >> >>> >> adding a step for user feedback earlier? Or are you just >> >>>> >> >>> >> trying to >> >>>> >> >>> >> make >> >>>> >> >>> >> design docs for key features more visible (and their >> approval >> >>>> >> >>> >> more >> >>>> >> >>> >> formal)? >> >>>> >> >>> >> >> >>>> >> >>> >> BTW note that in either case, I'd like to have a template >> for >> >>>> >> >>> >> design >> >>>> >> >>> >> docs >> >>>> >> >>> >> too, which should also include goals. I think that would've >> >>>> >> >>> >> avoided >> >>>> >> >>> >> some of >> >>>> >> >>> >> the issues you brought up. >> >>>> >> >>> >> >> >>>> >> >>> >> Matei >> >>>> >> >>> >> >> >>>> >> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden >> email]> >> >>>> >> >>> >> wrote: >> >>>> >> >>> >> >> >>>> >> >>> >> Here's my specific proposal (meta-proposal?) >> >>>> >> >>> >> >> >>>> >> >>> >> Spark Improvement Proposals (SIP) >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> Background: >> >>>> >> >>> >> >> >>>> >> >>> >> The current problem is that design and implementation of >> large >> >>>> >> >>> >> features >> >>>> >> >>> >> are >> >>>> >> >>> >> often done in private, before soliciting user feedback. >> >>>> >> >>> >> >> >>>> >> >>> >> When feedback is solicited, it is often as to detailed >> design >> >>>> >> >>> >> specifics, not >> >>>> >> >>> >> focused on goals. >> >>>> >> >>> >> >> >>>> >> >>> >> When implementation does take place after design, there is >> >>>> >> >>> >> often >> >>>> >> >>> >> disagreement as to what goals are or are not in scope. >> >>>> >> >>> >> >> >>>> >> >>> >> This results in commits that don't fully meet user needs. >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> Goals: >> >>>> >> >>> >> >> >>>> >> >>> >> - Ensure user, contributor, and committer goals are clearly >> >>>> >> >>> >> identified >> >>>> >> >>> >> and >> >>>> >> >>> >> agreed upon, before implementation takes place. >> >>>> >> >>> >> >> >>>> >> >>> >> - Ensure that a technically feasible strategy is chosen >> that >> >>>> >> >>> >> is >> >>>> >> >>> >> likely >> >>>> >> >>> >> to >> >>>> >> >>> >> meet the goals. >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> Rejected Goals: >> >>>> >> >>> >> >> >>>> >> >>> >> - SIPs are not for detailed design. Design by committee >> >>>> >> >>> >> doesn't >> >>>> >> >>> >> work. >> >>>> >> >>> >> >> >>>> >> >>> >> - SIPs are not for every change. We dont need that much >> >>>> >> >>> >> process. >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> Strategy: >> >>>> >> >>> >> >> >>>> >> >>> >> My suggestion is outlined as a Spark Improvement Proposal >> >>>> >> >>> >> process >> >>>> >> >>> >> documented >> >>>> >> >>> >> at >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> https://github.com/koeninger/s >> park-1/blob/SIP-0/docs/spark-improvement-proposals.md >> >>>> >> >>> >> >> >>>> >> >>> >> Specifics of Jira manipulation are an implementation >> detail we >> >>>> >> >>> >> can >> >>>> >> >>> >> figure >> >>>> >> >>> >> out. >> >>>> >> >>> >> >> >>>> >> >>> >> I'm suggesting voting; the need here is for a _clear_ >> outcome. >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> Rejected Strategies: >> >>>> >> >>> >> >> >>>> >> >>> >> Having someone who understands the problem implement it >> first >> >>>> >> >>> >> works, >> >>>> >> >>> >> but >> >>>> >> >>> >> only if significant iteration after user feedback is >> allowed. >> >>>> >> >>> >> >> >>>> >> >>> >> Historically this has been problematic due to pressure to >> >>>> >> >>> >> limit >> >>>> >> >>> >> public >> >>>> >> >>> >> api >> >>>> >> >>> >> changes. >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden >> email]> >> >>>> >> >>> >> wrote: >> >>>> >> >>> >>> >> >>>> >> >>> >>> Alright looks like there are quite a bit of support. We >> >>>> >> >>> >>> should >> >>>> >> >>> >>> wait >> >>>> >> >>> >>> to >> >>>> >> >>> >>> hear from more people too. >> >>>> >> >>> >>> >> >>>> >> >>> >>> To push this forward, Cody and I will be working together >> in >> >>>> >> >>> >>> the >> >>>> >> >>> >>> next >> >>>> >> >>> >>> couple of weeks to come up with a concrete, detailed >> proposal >> >>>> >> >>> >>> on >> >>>> >> >>> >>> what >> >>>> >> >>> >>> this >> >>>> >> >>> >>> entails, and then we can discuss this the specific >> proposal >> >>>> >> >>> >>> as >> >>>> >> >>> >>> well. >> >>>> >> >>> >>> >> >>>> >> >>> >>> >> >>>> >> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden >> >>>> >> >>> >>> email]> >> >>>> >> >>> >>> wrote: >> >>>> >> >>> >>>> >> >>>> >> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs >> for >> >>>> >> >>> >>>> major >> >>>> >> >>> >>>> user-facing or cross-cutting changes, not minor feature >> >>>> >> >>> >>>> adds. >> >>>> >> >>> >>>> >> >>>> >> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos >> >>>> >> >>> >>>> <[hidden email]> wrote: >> >>>> >> >>> >>>>> >> >>>> >> >>> >>>>> +1 to the SIP label as long as it does not slow down >> things >> >>>> >> >>> >>>>> and >> >>>> >> >>> >>>>> it >> >>>> >> >>> >>>>> targets optimizing efforts, coordination etc. For >> example >> >>>> >> >>> >>>>> really >> >>>> >> >>> >>>>> small >> >>>> >> >>> >>>>> features should not need to go through this process >> >>>> >> >>> >>>>> (assuming >> >>>> >> >>> >>>>> they >> >>>> >> >>> >>>>> dont >> >>>> >> >>> >>>>> touch public interfaces) or re-factorings and hope it >> will >> >>>> >> >>> >>>>> be >> >>>> >> >>> >>>>> kept >> >>>> >> >>> >>>>> this >> >>>> >> >>> >>>>> way. So as a guideline doc should be provided, like in >> the >> >>>> >> >>> >>>>> KIP >> >>>> >> >>> >>>>> case. >> >>>> >> >>> >>>>> >> >>>> >> >>> >>>>> IMHO so far aside from tagging things and linking them >> >>>> >> >>> >>>>> elsewhere >> >>>> >> >>> >>>>> simply >> >>>> >> >>> >>>>> having design docs and prototypes implementations in >> PRs is >> >>>> >> >>> >>>>> not >> >>>> >> >>> >>>>> something >> >>>> >> >>> >>>>> that has not worked so far. What is really a pain in >> many >> >>>> >> >>> >>>>> projects >> >>>> >> >>> >>>>> out there >> >>>> >> >>> >>>>> is discontinuity in progress of PRs, missing features, >> slow >> >>>> >> >>> >>>>> reviews >> >>>> >> >>> >>>>> which is >> >>>> >> >>> >>>>> understandable to some extent... it is not only about >> Spark >> >>>> >> >>> >>>>> but >> >>>> >> >>> >>>>> things can >> >>>> >> >>> >>>>> be improved for sure for this project in particular as >> >>>> >> >>> >>>>> already >> >>>> >> >>> >>>>> stated. >> >>>> >> >>> >>>>> >> >>>> >> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden >> >>>> >> >>> >>>>> email]> >> >>>> >> >>> >>>>> wrote: >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> +1 to adding an SIP label and linking it from the >> website. >> >>>> >> >>> >>>>>> I >> >>>> >> >>> >>>>>> think >> >>>> >> >>> >>>>>> it >> >>>> >> >>> >>>>>> needs >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> - template that focuses it towards soliciting user >> goals / >> >>>> >> >>> >>>>>> non >> >>>> >> >>> >>>>>> goals >> >>>> >> >>> >>>>>> - clear resolution as to which strategy was chosen to >> >>>> >> >>> >>>>>> pursue. >> >>>> >> >>> >>>>>> I'd >> >>>> >> >>> >>>>>> recommend a vote. >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> Matei asked me to clarify what I meant by changing >> >>>> >> >>> >>>>>> interfaces, >> >>>> >> >>> >>>>>> I >> >>>> >> >>> >>>>>> think >> >>>> >> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify >> >>>> >> >>> >>>>>> here, >> >>>> >> >>> >>>>>> and >> >>>> >> >>> >>>>>> split >> >>>> >> >>> >>>>>> a thread for the other discussion per Nicholas' >> request. >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> I meant changing public user interfaces. I think the >> >>>> >> >>> >>>>>> first >> >>>> >> >>> >>>>>> design >> >>>> >> >>> >>>>>> is >> >>>> >> >>> >>>>>> unlikely to be right, because it's done at a time when >> you >> >>>> >> >>> >>>>>> have >> >>>> >> >>> >>>>>> the >> >>>> >> >>> >>>>>> least information. As a user, I find it considerably >> more >> >>>> >> >>> >>>>>> frustrating >> >>>> >> >>> >>>>>> to be unable to use a tool to get my job done, than I >> do >> >>>> >> >>> >>>>>> having >> >>>> >> >>> >>>>>> to >> >>>> >> >>> >>>>>> make minor changes to my code in order to take >> advantage >> >>>> >> >>> >>>>>> of >> >>>> >> >>> >>>>>> features. >> >>>> >> >>> >>>>>> I've seen committers be seriously reluctant to allow >> >>>> >> >>> >>>>>> changes to >> >>>> >> >>> >>>>>> @experimental code that are needed in order for it to >> >>>> >> >>> >>>>>> really >> >>>> >> >>> >>>>>> work >> >>>> >> >>> >>>>>> right. You need to be able to iterate, and if people >> on >> >>>> >> >>> >>>>>> both >> >>>> >> >>> >>>>>> sides >> >>>> >> >>> >>>>>> of >> >>>> >> >>> >>>>>> the fence aren't going to respect that some newer apis >> are >> >>>> >> >>> >>>>>> subject >> >>>> >> >>> >>>>>> to >> >>>> >> >>> >>>>>> change, then why even mark them as such? >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> Ideally a finished SIP should give me a checklist of >> >>>> >> >>> >>>>>> things >> >>>> >> >>> >>>>>> that >> >>>> >> >>> >>>>>> an >> >>>> >> >>> >>>>>> implementation must do, and things that it doesn't >> need to >> >>>> >> >>> >>>>>> do. >> >>>> >> >>> >>>>>> Contributors/committers should be seriously discouraged >> >>>> >> >>> >>>>>> from >> >>>> >> >>> >>>>>> putting >> >>>> >> >>> >>>>>> out a version 0.1 that doesn't have at least a >> prototype >> >>>> >> >>> >>>>>> implementation of all those things, especially if >> they're >> >>>> >> >>> >>>>>> then >> >>>> >> >>> >>>>>> going >> >>>> >> >>> >>>>>> to argue against interface changes necessary to get the >> >>>> >> >>> >>>>>> the >> >>>> >> >>> >>>>>> rest >> >>>> >> >>> >>>>>> of >> >>>> >> >>> >>>>>> the things done in the 0.2 version. >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden >> >>>> >> >>> >>>>>> email]> >> >>>> >> >>> >>>>>> wrote: >> >>>> >> >>> >>>>>>> I like the lightweight proposal to add a SIP label. >> >>>> >> >>> >>>>>>> >> >>>> >> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I >> >>>> >> >>> >>>>>>> suggested >> >>>> >> >>> >>>>>>> using >> >>>> >> >>> >>>>>>> wiki >> >>>> >> >>> >>>>>>> to >> >>>> >> >>> >>>>>>> track the list of major changes, but that never really >> >>>> >> >>> >>>>>>> materialized >> >>>> >> >>> >>>>>>> due to >> >>>> >> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and >> then >> >>>> >> >>> >>>>>>> link >> >>>> >> >>> >>>>>>> to >> >>>> >> >>> >>>>>>> them >> >>>> >> >>> >>>>>>> prominently on the Spark website makes a lot of sense. >> >>>> >> >>> >>>>>>> >> >>>> >> >>> >>>>>>> >> >>>> >> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia >> >>>> >> >>> >>>>>>> <[hidden email]> >> >>>> >> >>> >>>>>>> wrote: >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> For the improvement proposals, I think one major >> point >> >>>> >> >>> >>>>>>>> was to >> >>>> >> >>> >>>>>>>> make >> >>>> >> >>> >>>>>>>> them >> >>>> >> >>> >>>>>>>> really visible to users who are not contributors, so >> we >> >>>> >> >>> >>>>>>>> should >> >>>> >> >>> >>>>>>>> do >> >>>> >> >>> >>>>>>>> more than >> >>>> >> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is >> to >> >>>> >> >>> >>>>>>>> have a >> >>>> >> >>> >>>>>>>> new >> >>>> >> >>> >>>>>>>> type of >> >>>> >> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that >> shows >> >>>> >> >>> >>>>>>>> all >> >>>> >> >>> >>>>>>>> such >> >>>> >> >>> >>>>>>>> JIRAs from >> >>>> >> >>> >>>>>>>> http://spark.apache.org. I also like the idea of >> SIP and >> >>>> >> >>> >>>>>>>> design >> >>>> >> >>> >>>>>>>> doc >> >>>> >> >>> >>>>>>>> templates (in fact many projects have them). >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> Matei >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden >> >>>> >> >>> >>>>>>>> email]> >> >>>> >> >>> >>>>>>>> wrote: >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> I called Cody last night and talked about some of the >> >>>> >> >>> >>>>>>>> topics >> >>>> >> >>> >>>>>>>> in >> >>>> >> >>> >>>>>>>> his >> >>>> >> >>> >>>>>>>> email. >> >>>> >> >>> >>>>>>>> It became clear to me Cody genuinely cares about the >> >>>> >> >>> >>>>>>>> project. >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> Some of the frustrations come from the success of the >> >>>> >> >>> >>>>>>>> project >> >>>> >> >>> >>>>>>>> itself >> >>>> >> >>> >>>>>>>> becoming very "hot", and it is difficult to get >> clarity >> >>>> >> >>> >>>>>>>> from >> >>>> >> >>> >>>>>>>> people >> >>>> >> >>> >>>>>>>> who >> >>>> >> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it >> is >> >>>> >> >>> >>>>>>>> in >> >>>> >> >>> >>>>>>>> some >> >>>> >> >>> >>>>>>>> ways >> >>>> >> >>> >>>>>>>> similar >> >>>> >> >>> >>>>>>>> to scaling an engineering team in a successful >> startup: >> >>>> >> >>> >>>>>>>> old >> >>>> >> >>> >>>>>>>> processes that >> >>>> >> >>> >>>>>>>> worked well might not work so well when it gets to a >> >>>> >> >>> >>>>>>>> certain >> >>>> >> >>> >>>>>>>> size, >> >>>> >> >>> >>>>>>>> cultures >> >>>> >> >>> >>>>>>>> can get diluted, building culture vs building >> process, >> >>>> >> >>> >>>>>>>> etc. >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> I also really like to have a more visible process for >> >>>> >> >>> >>>>>>>> larger >> >>>> >> >>> >>>>>>>> changes, >> >>>> >> >>> >>>>>>>> especially major user facing API changes. >> Historically >> >>>> >> >>> >>>>>>>> we >> >>>> >> >>> >>>>>>>> upload >> >>>> >> >>> >>>>>>>> design docs >> >>>> >> >>> >>>>>>>> for major changes, but it is not always consistent >> and >> >>>> >> >>> >>>>>>>> difficult >> >>>> >> >>> >>>>>>>> to >> >>>> >> >>> >>>>>>>> quality >> >>>> >> >>> >>>>>>>> of the docs, due to the volunteering nature of the >> >>>> >> >>> >>>>>>>> organization. >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on >> >>>> >> >>> >>>>>>>> building a >> >>>> >> >>> >>>>>>>> culture >> >>>> >> >>> >>>>>>>> to improve clarity: >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> - Process: Large changes should have design docs >> posted >> >>>> >> >>> >>>>>>>> on >> >>>> >> >>> >>>>>>>> JIRA. >> >>>> >> >>> >>>>>>>> One >> >>>> >> >>> >>>>>>>> thing >> >>>> >> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came >> to >> >>>> >> >>> >>>>>>>> me is >> >>>> >> >>> >>>>>>>> we >> >>>> >> >>> >>>>>>>> should >> >>>> >> >>> >>>>>>>> create a design doc template for the project and ask >> >>>> >> >>> >>>>>>>> everybody >> >>>> >> >>> >>>>>>>> to >> >>>> >> >>> >>>>>>>> follow. >> >>>> >> >>> >>>>>>>> The design doc template should also explicitly list >> >>>> >> >>> >>>>>>>> goals and >> >>>> >> >>> >>>>>>>> non-goals, to >> >>>> >> >>> >>>>>>>> make design doc more consistent. >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have >> some >> >>>> >> >>> >>>>>>>> this >> >>>> >> >>> >>>>>>>> with >> >>>> >> >>> >>>>>>>> some >> >>>> >> >>> >>>>>>>> changes, but again very inconsistent. Just posting >> >>>> >> >>> >>>>>>>> something >> >>>> >> >>> >>>>>>>> on >> >>>> >> >>> >>>>>>>> JIRA >> >>>> >> >>> >>>>>>>> isn't >> >>>> >> >>> >>>>>>>> sufficient, because there are simply too many JIRAs >> and >> >>>> >> >>> >>>>>>>> the >> >>>> >> >>> >>>>>>>> signal >> >>>> >> >>> >>>>>>>> get lost >> >>>> >> >>> >>>>>>>> in the noise. While this is generally impossible to >> >>>> >> >>> >>>>>>>> enforce >> >>>> >> >>> >>>>>>>> because >> >>>> >> >>> >>>>>>>> we can't >> >>>> >> >>> >>>>>>>> force all volunteers to conform to a process (or they >> >>>> >> >>> >>>>>>>> might >> >>>> >> >>> >>>>>>>> not >> >>>> >> >>> >>>>>>>> even >> >>>> >> >>> >>>>>>>> be >> >>>> >> >>> >>>>>>>> aware of this), those who are more familiar with the >> >>>> >> >>> >>>>>>>> project >> >>>> >> >>> >>>>>>>> can >> >>>> >> >>> >>>>>>>> help by >> >>>> >> >>> >>>>>>>> emailing the dev@ when they see something that >> hasn't >> >>>> >> >>> >>>>>>>> been. >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> - Culture: The design doc author(s) should be open to >> >>>> >> >>> >>>>>>>> feedback. >> >>>> >> >>> >>>>>>>> A >> >>>> >> >>> >>>>>>>> design >> >>>> >> >>> >>>>>>>> doc should serve as the base for discussion and is >> by no >> >>>> >> >>> >>>>>>>> means >> >>>> >> >>> >>>>>>>> the >> >>>> >> >>> >>>>>>>> final >> >>>> >> >>> >>>>>>>> design. Of course, this does not mean the author has >> to >> >>>> >> >>> >>>>>>>> accept >> >>>> >> >>> >>>>>>>> every >> >>>> >> >>> >>>>>>>> feedback. They should also be comfortable accepting / >> >>>> >> >>> >>>>>>>> rejecting >> >>>> >> >>> >>>>>>>> ideas on >> >>>> >> >>> >>>>>>>> technical grounds. >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it >> can >> >>>> >> >>> >>>>>>>> be >> >>>> >> >>> >>>>>>>> useful >> >>>> >> >>> >>>>>>>> to >> >>>> >> >>> >>>>>>>> have >> >>>> >> >>> >>>>>>>> some monthly Google hangouts that are open to the >> world. >> >>>> >> >>> >>>>>>>> I am >> >>>> >> >>> >>>>>>>> actually not >> >>>> >> >>> >>>>>>>> sure how well this will work, because of the >> >>>> >> >>> >>>>>>>> volunteering >> >>>> >> >>> >>>>>>>> nature >> >>>> >> >>> >>>>>>>> and >> >>>> >> >>> >>>>>>>> we need >> >>>> >> >>> >>>>>>>> to adjust for timezones for people across the globe, >> but >> >>>> >> >>> >>>>>>>> it >> >>>> >> >>> >>>>>>>> seems >> >>>> >> >>> >>>>>>>> worth >> >>>> >> >>> >>>>>>>> trying. >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> - Culture: Contributors (including committers) >> should be >> >>>> >> >>> >>>>>>>> more >> >>>> >> >>> >>>>>>>> direct >> >>>> >> >>> >>>>>>>> in >> >>>> >> >>> >>>>>>>> setting expectations, including whether they are >> working >> >>>> >> >>> >>>>>>>> on a >> >>>> >> >>> >>>>>>>> specific >> >>>> >> >>> >>>>>>>> issue, whether they will be working on a specific >> issue, >> >>>> >> >>> >>>>>>>> and >> >>>> >> >>> >>>>>>>> whether >> >>>> >> >>> >>>>>>>> an >> >>>> >> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I >> >>>> >> >>> >>>>>>>> know in >> >>>> >> >>> >>>>>>>> this >> >>>> >> >>> >>>>>>>> community >> >>>> >> >>> >>>>>>>> are nice and don't enjoy telling other people no, >> but it >> >>>> >> >>> >>>>>>>> is >> >>>> >> >>> >>>>>>>> often >> >>>> >> >>> >>>>>>>> more >> >>>> >> >>> >>>>>>>> annoying to a contributor to not know anything than >> >>>> >> >>> >>>>>>>> getting a >> >>>> >> >>> >>>>>>>> no. >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia >> >>>> >> >>> >>>>>>>> <[hidden email]> >> >>>> >> >>> >>>>>>>> wrote: >> >>>> >> >>> >>>>>>>>> >> >>>> >> >>> >>>>>>>>> >> >>>> >> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement >> >>>> >> >>> >>>>>>>>> Proposal" >> >>>> >> >>> >>>>>>>>> process that >> >>>> >> >>> >>>>>>>>> solicits user input on new APIs. For what it's >> worth, I >> >>>> >> >>> >>>>>>>>> don't >> >>>> >> >>> >>>>>>>>> think >> >>>> >> >>> >>>>>>>>> committers are trying to minimize their own work -- >> >>>> >> >>> >>>>>>>>> every >> >>>> >> >>> >>>>>>>>> committer >> >>>> >> >>> >>>>>>>>> cares >> >>>> >> >>> >>>>>>>>> about making the software useful for users. >> However, it >> >>>> >> >>> >>>>>>>>> is >> >>>> >> >>> >>>>>>>>> always >> >>>> >> >>> >>>>>>>>> hard to >> >>>> >> >>> >>>>>>>>> get user input and so it helps to have this kind of >> >>>> >> >>> >>>>>>>>> process. >> >>>> >> >>> >>>>>>>>> I've >> >>>> >> >>> >>>>>>>>> certainly >> >>>> >> >>> >>>>>>>>> looked at the *IPs a lot in other software I use >> just >> >>>> >> >>> >>>>>>>>> to see >> >>>> >> >>> >>>>>>>>> the >> >>>> >> >>> >>>>>>>>> biggest >> >>>> >> >>> >>>>>>>>> things on the roadmap. >> >>>> >> >>> >>>>>>>>> >> >>>> >> >>> >>>>>>>>> When you're talking about "changing interfaces", are >> >>>> >> >>> >>>>>>>>> you >> >>>> >> >>> >>>>>>>>> talking >> >>>> >> >>> >>>>>>>>> about >> >>>> >> >>> >>>>>>>>> public or internal APIs? I do think many people hate >> >>>> >> >>> >>>>>>>>> changing >> >>>> >> >>> >>>>>>>>> public APIs >> >>>> >> >>> >>>>>>>>> and I actually think that's for the best of the >> >>>> >> >>> >>>>>>>>> project. >> >>>> >> >>> >>>>>>>>> That's >> >>>> >> >>> >>>>>>>>> a >> >>>> >> >>> >>>>>>>>> technical >> >>>> >> >>> >>>>>>>>> debate, but basically, the worst thing when you're >> >>>> >> >>> >>>>>>>>> using a >> >>>> >> >>> >>>>>>>>> piece >> >>>> >> >>> >>>>>>>>> of >> >>>> >> >>> >>>>>>>>> software >> >>>> >> >>> >>>>>>>>> is that the developers constantly ask you to rewrite >> >>>> >> >>> >>>>>>>>> your >> >>>> >> >>> >>>>>>>>> app >> >>>> >> >>> >>>>>>>>> to >> >>>> >> >>> >>>>>>>>> update to a >> >>>> >> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). >> Cue >> >>>> >> >>> >>>>>>>>> anyone >> >>>> >> >>> >>>>>>>>> who's used >> >>>> >> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to >> change >> >>>> >> >>> >>>>>>>>> their >> >>>> >> >>> >>>>>>>>> code >> >>>> >> >>> >>>>>>>>> this >> >>>> >> >>> >>>>>>>>> release" model works well within a single large >> >>>> >> >>> >>>>>>>>> company, but >> >>>> >> >>> >>>>>>>>> doesn't work >> >>>> >> >>> >>>>>>>>> well for a community, which is why nearly all *very* >> >>>> >> >>> >>>>>>>>> widely >> >>>> >> >>> >>>>>>>>> used >> >>>> >> >>> >>>>>>>>> programming >> >>>> >> >>> >>>>>>>>> interfaces (I'm talking things like Java standard >> >>>> >> >>> >>>>>>>>> library, >> >>>> >> >>> >>>>>>>>> Windows >> >>>> >> >>> >>>>>>>>> API, etc) >> >>>> >> >>> >>>>>>>>> almost *never* break backwards compatibility. All >> this >> >>>> >> >>> >>>>>>>>> is >> >>>> >> >>> >>>>>>>>> done >> >>>> >> >>> >>>>>>>>> within reason >> >>>> >> >>> >>>>>>>>> though, e.g. we do change things in major releases >> >>>> >> >>> >>>>>>>>> (2.x, >> >>>> >> >>> >>>>>>>>> 3.x, >> >>>> >> >>> >>>>>>>>> etc). >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>>> >> >>>> >> >>> >>>>>>> >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>>> ------------------------------ >> --------------------------------------- >> >>>> >> >>> >>>>>> To unsubscribe e-mail: [hidden email] >> >>>> >> >>> >>>>>> >> >>>> >> >>> >>>>> >> >>>> >> >>> >>>>> >> >>>> >> >>> >>>>> >> >>>> >> >>> >>>>> -- >> >>>> >> >>> >>>>> Stavros Kontopoulos >> >>>> >> >>> >>>>> Senior Software Engineer >> >>>> >> >>> >>>>> Lightbend, Inc. >> >>>> >> >>> >>>>> p: +30 6977967274 >> >>>> >> >>> >>>>> e: [hidden email] >> >>>> >> >>> >>>>> >> >>>> >> >>> >>>>> >> >>>> >> >>> >>>> >> >>>> >> >>> >>> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > ------------------------------------------------------------ >> --------- >> >>>> >> > To unsubscribe e-mail: [hidden email] >> >>>> >> > >> >>>> >> > >> >>>> >> > ________________________________ >> >>>> >> > >> >>>> >> > If you reply to this email, your message will be added to the >> >>>> >> > discussion >> >>>> >> > below: >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > http://apache-spark-developers-list.1001551.n3.nabble.com/ >> Spark-Improvement-Proposals-tp19268p19359.html >> >>>> >> > >> >>>> >> > To start a new topic under Apache Spark Developers List, email >> >>>> >> > [hidden >> >>>> >> > email] >> >>>> >> > To unsubscribe from Apache Spark Developers List, click here. >> >>>> >> > NAML >> >>>> >> > >> >>>> >> > >> >>>> >> > ________________________________ >> >>>> >> > View this message in context: RE: Spark Improvement Proposals >> >>>> >> > Sent from the Apache Spark Developers List mailing list archive >> at >> >>>> >> > Nabble.com. >> >>>> >> >> >>>> >> ------------------------------------------------------------ >> --------- >> >>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>>> >> >> >>>> > >> >>>> > >> >>>> > >> >>>> > -- >> >>>> > Ryan Blue >> >>>> > Software Engineer >> >>>> > Netflix >> >>>> >> >>>> ------------------------------------------------------------ >> --------- >> >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>>> >> >>> >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > > > -- > Ryan Blue > Software Engineer > Netflix >