Yeah, I've looked at KIPs and Scala SIPs. I'm reluctant to use the Kafka structured streaming as an example because of the pre-existing conflict around it. If Michael or another committer wanted to put it forth as an example, I'd participate in good faith though.
On Sun, Oct 9, 2016 at 5:07 PM, Ofir Manor <ofir.ma...@equalum.io> wrote: > This is a great discussion! > Maybe you could have a look at Kafka's process - it also uses Rejected > Alternatives and I personally find it very clear actually (the link also > leads to all KIPs): > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > Cody - maybe you could take one of the open issues and write a sample > proposal? A concrete example might make it clearer for those who see this > for the first time. Maybe the Kafka offset discussion or some other > Kafka/Structured Streaming open issue? Will that be helpful? > > Ofir Manor > > Co-Founder & CTO | Equalum > > Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io > > > On Mon, Oct 10, 2016 at 12:36 AM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: >> >> Yup, this is the stuff that I found unclear. Thanks for clarifying here, >> but we should also clarify it in the writeup. In particular: >> >> - Goals needs to be about user-facing behavior ("people" is broad) >> >> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up >> one of these and say "Spark's developers have officially rejected X, which >> our awesome system has". >> >> - For user-facing stuff, I think you need a section on API. Virtually all >> other *IPs I've seen have that. >> >> - I'm still not sure why the strategy section is needed if the purpose is >> to define user-facing behavior -- unless this is the strategy for setting >> the goals or for defining the API. That sounds squarely like a design doc >> issue. In some sense, who cares whether the proposal is technically feasible >> right now? If it's infeasible, that will be discovered later during design >> and implementation. Same thing with rejected strategies -- listing some of >> those is definitely useful sometimes, but if you make this a *required* >> section, people are just going to fill it in with bogus stuff (I've seen >> this happen before). >> >> Matei >> >> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <c...@koeninger.org> wrote: >> > >> > So to focus the discussion on the specific strategy I'm suggesting, >> > documented at >> > >> > >> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >> > >> > "Goals: What must this allow people to do, that they can't currently?" >> > >> > Is it unclear that this is focusing specifically on people-visible >> > behavior? >> > >> > Rejected goals - are important because otherwise people keep trying >> > to argue about scope. Of course you can change things later with a >> > different SIP and different vote, the point is to focus. >> > >> > Use cases - are something that people are going to bring up in >> > discussion. If they aren't clearly documented as a goal ("This must >> > allow me to connect using SSL"), they should be added. >> > >> > Internal architecture - if the people who need specific behavior are >> > implementers of other parts of the system, that's fine. >> > >> > Rejected strategies - If you have none of these, you have no evidence >> > that the proponent didn't just go with the first thing they had in >> > mind (or have already implemented), which is a big problem currently. >> > Approval isn't binding as to specifics of implementation, so these >> > aren't handcuffs. The goals are the contract, the strategy is >> > evidence that contract can actually be met. >> > >> > Design docs - I'm not touching design docs. The markdown file I >> > linked specifically says of the strategy section "This is not a full >> > design document." Is this unclear? Design docs can be worked on >> > obviously, but that's not what I'm concerned with here. >> > >> > >> > >> > >> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <matei.zaha...@gmail.com> >> > wrote: >> >> Hi Cody, >> >> >> >> I think this would be a lot more concrete if we had a more detailed >> >> template >> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are >> >> they >> >> a way to solicit feedback on the user-facing behavior or on the >> >> internals? >> >> "Goals" can cover both things. I've been thinking of SIPs more as >> >> Product >> >> Requirements Docs (PRDs), which focus on *what* a code change should do >> >> as >> >> opposed to how. >> >> >> >> In particular, here are some things that you may or may not consider in >> >> scope for SIPs: >> >> >> >> - Goals and non-goals: This is definitely in scope, and IMO should >> >> focus on >> >> user-visible behavior (e.g. "system supports SQL window functions" or >> >> "system continues working if one node fails"). BTW I wouldn't say >> >> "rejected >> >> goals" because some of them might become goals later, so we're not >> >> definitively rejecting them. >> >> >> >> - Public API: Probably should be included in most SIPs unless it's too >> >> large >> >> to fully specify then (e.g. "let's add an ML library"). >> >> >> >> - Use cases: I usually find this very useful in PRDs to better >> >> communicate >> >> the goals. >> >> >> >> - Internal architecture: This is usually *not* a thing users can easily >> >> comment on and it sounds more like a design doc item. Of course it's >> >> important to show that the SIP is feasible to implement. One exception, >> >> however, is that I think we'll have some SIPs primarily on internals >> >> (e.g. >> >> if somebody wants to refactor Spark's query optimizer or something). >> >> >> >> - Rejected strategies: I personally wouldn't put this, because what's >> >> the >> >> point of voting to reject a strategy before you've really begun >> >> designing >> >> and implementing something? What if you discover that the strategy is >> >> actually better when you start doing stuff? >> >> >> >> At a super high level, it depends on whether you want the SIPs to be >> >> PRDs >> >> for getting some quick feedback on the goals of a feature before it is >> >> designed, or something more like full-fledged design docs (just a more >> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and >> >> they >> >> actually seem to be more like design docs. This can work too but it >> >> does >> >> require more work from the proposer and it can lead to the same >> >> problems you >> >> mentioned with people already having a design and implementation in >> >> mind. >> >> >> >> Basically, the question is, are you trying to iterate faster on design >> >> by >> >> adding a step for user feedback earlier? Or are you just trying to make >> >> design docs for key features more visible (and their approval more >> >> formal)? >> >> >> >> BTW note that in either case, I'd like to have a template for design >> >> docs >> >> too, which should also include goals. I think that would've avoided >> >> some of >> >> the issues you brought up. >> >> >> >> Matei >> >> >> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> >> >> Here's my specific proposal (meta-proposal?) >> >> >> >> Spark Improvement Proposals (SIP) >> >> >> >> >> >> Background: >> >> >> >> The current problem is that design and implementation of large features >> >> are >> >> often done in private, before soliciting user feedback. >> >> >> >> When feedback is solicited, it is often as to detailed design >> >> specifics, not >> >> focused on goals. >> >> >> >> When implementation does take place after design, there is often >> >> disagreement as to what goals are or are not in scope. >> >> >> >> This results in commits that don't fully meet user needs. >> >> >> >> >> >> Goals: >> >> >> >> - Ensure user, contributor, and committer goals are clearly identified >> >> and >> >> agreed upon, before implementation takes place. >> >> >> >> - Ensure that a technically feasible strategy is chosen that is likely >> >> to >> >> meet the goals. >> >> >> >> >> >> Rejected Goals: >> >> >> >> - SIPs are not for detailed design. Design by committee doesn't work. >> >> >> >> - SIPs are not for every change. We dont need that much process. >> >> >> >> >> >> Strategy: >> >> >> >> My suggestion is outlined as a Spark Improvement Proposal process >> >> documented >> >> at >> >> >> >> >> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >> >> >> >> Specifics of Jira manipulation are an implementation detail we can >> >> figure >> >> out. >> >> >> >> I'm suggesting voting; the need here is for a _clear_ outcome. >> >> >> >> >> >> Rejected Strategies: >> >> >> >> Having someone who understands the problem implement it first works, >> >> but >> >> only if significant iteration after user feedback is allowed. >> >> >> >> Historically this has been problematic due to pressure to limit public >> >> api >> >> changes. >> >> >> >> >> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <r...@databricks.com> >> >> wrote: >> >>> >> >>> Alright looks like there are quite a bit of support. We should wait to >> >>> hear from more people too. >> >>> >> >>> To push this forward, Cody and I will be working together in the next >> >>> couple of weeks to come up with a concrete, detailed proposal on what >> >>> this >> >>> entails, and then we can discuss this the specific proposal as well. >> >>> >> >>> >> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> >> >>> wrote: >> >>>> >> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major >> >>>> user-facing or cross-cutting changes, not minor feature adds. >> >>>> >> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos >> >>>> <stavros.kontopou...@lightbend.com> wrote: >> >>>>> >> >>>>> +1 to the SIP label as long as it does not slow down things and it >> >>>>> targets optimizing efforts, coordination etc. For example really >> >>>>> small >> >>>>> features should not need to go through this process (assuming they >> >>>>> dont >> >>>>> touch public interfaces) or re-factorings and hope it will be kept >> >>>>> this >> >>>>> way. So as a guideline doc should be provided, like in the KIP case. >> >>>>> >> >>>>> IMHO so far aside from tagging things and linking them elsewhere >> >>>>> simply >> >>>>> having design docs and prototypes implementations in PRs is not >> >>>>> something >> >>>>> that has not worked so far. What is really a pain in many projects >> >>>>> out there >> >>>>> is discontinuity in progress of PRs, missing features, slow reviews >> >>>>> which is >> >>>>> understandable to some extent... it is not only about Spark but >> >>>>> things can >> >>>>> be improved for sure for this project in particular as already >> >>>>> stated. >> >>>>> >> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org> >> >>>>> wrote: >> >>>>>> >> >>>>>> +1 to adding an SIP label and linking it from the website. I think >> >>>>>> it >> >>>>>> needs >> >>>>>> >> >>>>>> - template that focuses it towards soliciting user goals / non >> >>>>>> goals >> >>>>>> - clear resolution as to which strategy was chosen to pursue. I'd >> >>>>>> recommend a vote. >> >>>>>> >> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I >> >>>>>> think >> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and >> >>>>>> split >> >>>>>> a thread for the other discussion per Nicholas' request. >> >>>>>> >> >>>>>> I meant changing public user interfaces. I think the first design >> >>>>>> is >> >>>>>> unlikely to be right, because it's done at a time when you have the >> >>>>>> least information. As a user, I find it considerably more >> >>>>>> frustrating >> >>>>>> to be unable to use a tool to get my job done, than I do having to >> >>>>>> make minor changes to my code in order to take advantage of >> >>>>>> features. >> >>>>>> I've seen committers be seriously reluctant to allow changes to >> >>>>>> @experimental code that are needed in order for it to really work >> >>>>>> right. You need to be able to iterate, and if people on both sides >> >>>>>> of >> >>>>>> the fence aren't going to respect that some newer apis are subject >> >>>>>> to >> >>>>>> change, then why even mark them as such? >> >>>>>> >> >>>>>> Ideally a finished SIP should give me a checklist of things that an >> >>>>>> implementation must do, and things that it doesn't need to do. >> >>>>>> Contributors/committers should be seriously discouraged from >> >>>>>> putting >> >>>>>> out a version 0.1 that doesn't have at least a prototype >> >>>>>> implementation of all those things, especially if they're then >> >>>>>> going >> >>>>>> to argue against interface changes necessary to get the the rest of >> >>>>>> the things done in the 0.2 version. >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com> >> >>>>>> wrote: >> >>>>>>> I like the lightweight proposal to add a SIP label. >> >>>>>>> >> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using >> >>>>>>> wiki >> >>>>>>> to >> >>>>>>> track the list of major changes, but that never really >> >>>>>>> materialized >> >>>>>>> due to >> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to >> >>>>>>> them >> >>>>>>> prominently on the Spark website makes a lot of sense. >> >>>>>>> >> >>>>>>> >> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia >> >>>>>>> <matei.zaha...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> For the improvement proposals, I think one major point was to >> >>>>>>>> make >> >>>>>>>> them >> >>>>>>>> really visible to users who are not contributors, so we should do >> >>>>>>>> more than >> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new >> >>>>>>>> type of >> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all such >> >>>>>>>> JIRAs from >> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design >> >>>>>>>> doc >> >>>>>>>> templates (in fact many projects have them). >> >>>>>>>> >> >>>>>>>> Matei >> >>>>>>>> >> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> I called Cody last night and talked about some of the topics in >> >>>>>>>> his >> >>>>>>>> email. >> >>>>>>>> It became clear to me Cody genuinely cares about the project. >> >>>>>>>> >> >>>>>>>> Some of the frustrations come from the success of the project >> >>>>>>>> itself >> >>>>>>>> becoming very "hot", and it is difficult to get clarity from >> >>>>>>>> people >> >>>>>>>> who >> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some >> >>>>>>>> ways >> >>>>>>>> similar >> >>>>>>>> to scaling an engineering team in a successful startup: old >> >>>>>>>> processes that >> >>>>>>>> worked well might not work so well when it gets to a certain >> >>>>>>>> size, >> >>>>>>>> cultures >> >>>>>>>> can get diluted, building culture vs building process, etc. >> >>>>>>>> >> >>>>>>>> I also really like to have a more visible process for larger >> >>>>>>>> changes, >> >>>>>>>> especially major user facing API changes. Historically we upload >> >>>>>>>> design docs >> >>>>>>>> for major changes, but it is not always consistent and difficult >> >>>>>>>> to >> >>>>>>>> quality >> >>>>>>>> of the docs, due to the volunteering nature of the organization. >> >>>>>>>> >> >>>>>>>> Some of the more concrete ideas we discussed focus on building a >> >>>>>>>> culture >> >>>>>>>> to improve clarity: >> >>>>>>>> >> >>>>>>>> - Process: Large changes should have design docs posted on JIRA. >> >>>>>>>> One >> >>>>>>>> thing >> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we >> >>>>>>>> should >> >>>>>>>> create a design doc template for the project and ask everybody to >> >>>>>>>> follow. >> >>>>>>>> The design doc template should also explicitly list goals and >> >>>>>>>> non-goals, to >> >>>>>>>> make design doc more consistent. >> >>>>>>>> >> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with >> >>>>>>>> some >> >>>>>>>> changes, but again very inconsistent. Just posting something on >> >>>>>>>> JIRA >> >>>>>>>> isn't >> >>>>>>>> sufficient, because there are simply too many JIRAs and the >> >>>>>>>> signal >> >>>>>>>> get lost >> >>>>>>>> in the noise. While this is generally impossible to enforce >> >>>>>>>> because >> >>>>>>>> we can't >> >>>>>>>> force all volunteers to conform to a process (or they might not >> >>>>>>>> even >> >>>>>>>> be >> >>>>>>>> aware of this), those who are more familiar with the project can >> >>>>>>>> help by >> >>>>>>>> emailing the dev@ when they see something that hasn't been. >> >>>>>>>> >> >>>>>>>> - Culture: The design doc author(s) should be open to feedback. A >> >>>>>>>> design >> >>>>>>>> doc should serve as the base for discussion and is by no means >> >>>>>>>> the >> >>>>>>>> final >> >>>>>>>> design. Of course, this does not mean the author has to accept >> >>>>>>>> every >> >>>>>>>> feedback. They should also be comfortable accepting / rejecting >> >>>>>>>> ideas on >> >>>>>>>> technical grounds. >> >>>>>>>> >> >>>>>>>> - Process / Culture: For major ongoing projects, it can be useful >> >>>>>>>> to >> >>>>>>>> have >> >>>>>>>> some monthly Google hangouts that are open to the world. I am >> >>>>>>>> actually not >> >>>>>>>> sure how well this will work, because of the volunteering nature >> >>>>>>>> and >> >>>>>>>> we need >> >>>>>>>> to adjust for timezones for people across the globe, but it seems >> >>>>>>>> worth >> >>>>>>>> trying. >> >>>>>>>> >> >>>>>>>> - Culture: Contributors (including committers) should be more >> >>>>>>>> direct >> >>>>>>>> in >> >>>>>>>> setting expectations, including whether they are working on a >> >>>>>>>> specific >> >>>>>>>> issue, whether they will be working on a specific issue, and >> >>>>>>>> whether >> >>>>>>>> an >> >>>>>>>> issue or pr or jira should be rejected. Most people I know in >> >>>>>>>> this >> >>>>>>>> community >> >>>>>>>> are nice and don't enjoy telling other people no, but it is often >> >>>>>>>> more >> >>>>>>>> annoying to a contributor to not know anything than getting a no. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia >> >>>>>>>> <matei.zaha...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal" >> >>>>>>>>> process that >> >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't >> >>>>>>>>> think >> >>>>>>>>> committers are trying to minimize their own work -- every >> >>>>>>>>> committer >> >>>>>>>>> cares >> >>>>>>>>> about making the software useful for users. However, it is >> >>>>>>>>> always >> >>>>>>>>> hard to >> >>>>>>>>> get user input and so it helps to have this kind of process. >> >>>>>>>>> I've >> >>>>>>>>> certainly >> >>>>>>>>> looked at the *IPs a lot in other software I use just to see the >> >>>>>>>>> biggest >> >>>>>>>>> things on the roadmap. >> >>>>>>>>> >> >>>>>>>>> When you're talking about "changing interfaces", are you talking >> >>>>>>>>> about >> >>>>>>>>> public or internal APIs? I do think many people hate changing >> >>>>>>>>> public APIs >> >>>>>>>>> and I actually think that's for the best of the project. That's >> >>>>>>>>> a >> >>>>>>>>> technical >> >>>>>>>>> debate, but basically, the worst thing when you're using a piece >> >>>>>>>>> of >> >>>>>>>>> software >> >>>>>>>>> is that the developers constantly ask you to rewrite your app to >> >>>>>>>>> update to a >> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone >> >>>>>>>>> who's used >> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code >> >>>>>>>>> this >> >>>>>>>>> release" model works well within a single large company, but >> >>>>>>>>> doesn't work >> >>>>>>>>> well for a community, which is why nearly all *very* widely used >> >>>>>>>>> programming >> >>>>>>>>> interfaces (I'm talking things like Java standard library, >> >>>>>>>>> Windows >> >>>>>>>>> API, etc) >> >>>>>>>>> almost *never* break backwards compatibility. All this is done >> >>>>>>>>> within reason >> >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x, >> >>>>>>>>> etc). >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> --------------------------------------------------------------------- >> >>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Stavros Kontopoulos >> >>>>> Senior Software Engineer >> >>>>> Lightbend, Inc. >> >>>>> p: +30 6977967274 >> >>>>> e: stavros.kontopou...@lightbend.com >> >>>>> >> >>>>> >> >>>> >> >>> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org