This is a great discussion! Maybe you could have a look at Kafka's process - it also uses Rejected Alternatives and I personally find it very clear actually (the link also leads to all KIPs):
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals Cody - maybe you could take one of the open issues and write a sample proposal? A concrete example might make it clearer for those who see this for the first time. Maybe the Kafka offset discussion or some other Kafka/Structured Streaming open issue? Will that be helpful? Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Mon, Oct 10, 2016 at 12:36 AM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > Yup, this is the stuff that I found unclear. Thanks for clarifying here, > but we should also clarify it in the writeup. In particular: > > - Goals needs to be about user-facing behavior ("people" is broad) > > - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up > one of these and say "Spark's developers have officially rejected X, which > our awesome system has". > > - For user-facing stuff, I think you need a section on API. Virtually all > other *IPs I've seen have that. > > - I'm still not sure why the strategy section is needed if the purpose is > to define user-facing behavior -- unless this is the strategy for setting > the goals or for defining the API. That sounds squarely like a design doc > issue. In some sense, who cares whether the proposal is technically > feasible right now? If it's infeasible, that will be discovered later > during design and implementation. Same thing with rejected strategies -- > listing some of those is definitely useful sometimes, but if you make this > a *required* section, people are just going to fill it in with bogus stuff > (I've seen this happen before). > > Matei > > > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <c...@koeninger.org> wrote: > > > > So to focus the discussion on the specific strategy I'm suggesting, > > documented at > > > > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark- > improvement-proposals.md > > > > "Goals: What must this allow people to do, that they can't currently?" > > > > Is it unclear that this is focusing specifically on people-visible > behavior? > > > > Rejected goals - are important because otherwise people keep trying > > to argue about scope. Of course you can change things later with a > > different SIP and different vote, the point is to focus. > > > > Use cases - are something that people are going to bring up in > > discussion. If they aren't clearly documented as a goal ("This must > > allow me to connect using SSL"), they should be added. > > > > Internal architecture - if the people who need specific behavior are > > implementers of other parts of the system, that's fine. > > > > Rejected strategies - If you have none of these, you have no evidence > > that the proponent didn't just go with the first thing they had in > > mind (or have already implemented), which is a big problem currently. > > Approval isn't binding as to specifics of implementation, so these > > aren't handcuffs. The goals are the contract, the strategy is > > evidence that contract can actually be met. > > > > Design docs - I'm not touching design docs. The markdown file I > > linked specifically says of the strategy section "This is not a full > > design document." Is this unclear? Design docs can be worked on > > obviously, but that's not what I'm concerned with here. > > > > > > > > > > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > >> Hi Cody, > >> > >> I think this would be a lot more concrete if we had a more detailed > template > >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are > they > >> a way to solicit feedback on the user-facing behavior or on the > internals? > >> "Goals" can cover both things. I've been thinking of SIPs more as > Product > >> Requirements Docs (PRDs), which focus on *what* a code change should do > as > >> opposed to how. > >> > >> In particular, here are some things that you may or may not consider in > >> scope for SIPs: > >> > >> - Goals and non-goals: This is definitely in scope, and IMO should > focus on > >> user-visible behavior (e.g. "system supports SQL window functions" or > >> "system continues working if one node fails"). BTW I wouldn't say > "rejected > >> goals" because some of them might become goals later, so we're not > >> definitively rejecting them. > >> > >> - Public API: Probably should be included in most SIPs unless it's too > large > >> to fully specify then (e.g. "let's add an ML library"). > >> > >> - Use cases: I usually find this very useful in PRDs to better > communicate > >> the goals. > >> > >> - Internal architecture: This is usually *not* a thing users can easily > >> comment on and it sounds more like a design doc item. Of course it's > >> important to show that the SIP is feasible to implement. One exception, > >> however, is that I think we'll have some SIPs primarily on internals > (e.g. > >> if somebody wants to refactor Spark's query optimizer or something). > >> > >> - Rejected strategies: I personally wouldn't put this, because what's > the > >> point of voting to reject a strategy before you've really begun > designing > >> and implementing something? What if you discover that the strategy is > >> actually better when you start doing stuff? > >> > >> At a super high level, it depends on whether you want the SIPs to be > PRDs > >> for getting some quick feedback on the goals of a feature before it is > >> designed, or something more like full-fledged design docs (just a more > >> visible design doc for bigger changes). I looked at Kafka's KIPs, and > they > >> actually seem to be more like design docs. This can work too but it does > >> require more work from the proposer and it can lead to the same > problems you > >> mentioned with people already having a design and implementation in > mind. > >> > >> Basically, the question is, are you trying to iterate faster on design > by > >> adding a step for user feedback earlier? Or are you just trying to make > >> design docs for key features more visible (and their approval more > formal)? > >> > >> BTW note that in either case, I'd like to have a template for design > docs > >> too, which should also include goals. I think that would've avoided > some of > >> the issues you brought up. > >> > >> Matei > >> > >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <c...@koeninger.org> wrote: > >> > >> Here's my specific proposal (meta-proposal?) > >> > >> Spark Improvement Proposals (SIP) > >> > >> > >> Background: > >> > >> The current problem is that design and implementation of large features > are > >> often done in private, before soliciting user feedback. > >> > >> When feedback is solicited, it is often as to detailed design > specifics, not > >> focused on goals. > >> > >> When implementation does take place after design, there is often > >> disagreement as to what goals are or are not in scope. > >> > >> This results in commits that don't fully meet user needs. > >> > >> > >> Goals: > >> > >> - Ensure user, contributor, and committer goals are clearly identified > and > >> agreed upon, before implementation takes place. > >> > >> - Ensure that a technically feasible strategy is chosen that is likely > to > >> meet the goals. > >> > >> > >> Rejected Goals: > >> > >> - SIPs are not for detailed design. Design by committee doesn't work. > >> > >> - SIPs are not for every change. We dont need that much process. > >> > >> > >> Strategy: > >> > >> My suggestion is outlined as a Spark Improvement Proposal process > documented > >> at > >> > >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark- > improvement-proposals.md > >> > >> Specifics of Jira manipulation are an implementation detail we can > figure > >> out. > >> > >> I'm suggesting voting; the need here is for a _clear_ outcome. > >> > >> > >> Rejected Strategies: > >> > >> Having someone who understands the problem implement it first works, but > >> only if significant iteration after user feedback is allowed. > >> > >> Historically this has been problematic due to pressure to limit public > api > >> changes. > >> > >> > >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <r...@databricks.com> > wrote: > >>> > >>> Alright looks like there are quite a bit of support. We should wait to > >>> hear from more people too. > >>> > >>> To push this forward, Cody and I will be working together in the next > >>> couple of weeks to come up with a concrete, detailed proposal on what > this > >>> entails, and then we can discuss this the specific proposal as well. > >>> > >>> > >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> > wrote: > >>>> > >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major > >>>> user-facing or cross-cutting changes, not minor feature adds. > >>>> > >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos > >>>> <stavros.kontopou...@lightbend.com> wrote: > >>>>> > >>>>> +1 to the SIP label as long as it does not slow down things and it > >>>>> targets optimizing efforts, coordination etc. For example really > small > >>>>> features should not need to go through this process (assuming they > dont > >>>>> touch public interfaces) or re-factorings and hope it will be kept > this > >>>>> way. So as a guideline doc should be provided, like in the KIP case. > >>>>> > >>>>> IMHO so far aside from tagging things and linking them elsewhere > simply > >>>>> having design docs and prototypes implementations in PRs is not > something > >>>>> that has not worked so far. What is really a pain in many projects > out there > >>>>> is discontinuity in progress of PRs, missing features, slow reviews > which is > >>>>> understandable to some extent... it is not only about Spark but > things can > >>>>> be improved for sure for this project in particular as already > stated. > >>>>> > >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org> > >>>>> wrote: > >>>>>> > >>>>>> +1 to adding an SIP label and linking it from the website. I think > it > >>>>>> needs > >>>>>> > >>>>>> - template that focuses it towards soliciting user goals / non goals > >>>>>> - clear resolution as to which strategy was chosen to pursue. I'd > >>>>>> recommend a vote. > >>>>>> > >>>>>> Matei asked me to clarify what I meant by changing interfaces, I > think > >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and > split > >>>>>> a thread for the other discussion per Nicholas' request. > >>>>>> > >>>>>> I meant changing public user interfaces. I think the first design > is > >>>>>> unlikely to be right, because it's done at a time when you have the > >>>>>> least information. As a user, I find it considerably more > frustrating > >>>>>> to be unable to use a tool to get my job done, than I do having to > >>>>>> make minor changes to my code in order to take advantage of > features. > >>>>>> I've seen committers be seriously reluctant to allow changes to > >>>>>> @experimental code that are needed in order for it to really work > >>>>>> right. You need to be able to iterate, and if people on both sides > of > >>>>>> the fence aren't going to respect that some newer apis are subject > to > >>>>>> change, then why even mark them as such? > >>>>>> > >>>>>> Ideally a finished SIP should give me a checklist of things that an > >>>>>> implementation must do, and things that it doesn't need to do. > >>>>>> Contributors/committers should be seriously discouraged from putting > >>>>>> out a version 0.1 that doesn't have at least a prototype > >>>>>> implementation of all those things, especially if they're then going > >>>>>> to argue against interface changes necessary to get the the rest of > >>>>>> the things done in the 0.2 version. > >>>>>> > >>>>>> > >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com> > >>>>>> wrote: > >>>>>>> I like the lightweight proposal to add a SIP label. > >>>>>>> > >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using > wiki > >>>>>>> to > >>>>>>> track the list of major changes, but that never really materialized > >>>>>>> due to > >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to > them > >>>>>>> prominently on the Spark website makes a lot of sense. > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia > >>>>>>> <matei.zaha...@gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> For the improvement proposals, I think one major point was to make > >>>>>>>> them > >>>>>>>> really visible to users who are not contributors, so we should do > >>>>>>>> more than > >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new > >>>>>>>> type of > >>>>>>>> JIRA called a SIP and have a link to a filter that shows all such > >>>>>>>> JIRAs from > >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design > doc > >>>>>>>> templates (in fact many projects have them). > >>>>>>>> > >>>>>>>> Matei > >>>>>>>> > >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> I called Cody last night and talked about some of the topics in > his > >>>>>>>> email. > >>>>>>>> It became clear to me Cody genuinely cares about the project. > >>>>>>>> > >>>>>>>> Some of the frustrations come from the success of the project > itself > >>>>>>>> becoming very "hot", and it is difficult to get clarity from > people > >>>>>>>> who > >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some > ways > >>>>>>>> similar > >>>>>>>> to scaling an engineering team in a successful startup: old > >>>>>>>> processes that > >>>>>>>> worked well might not work so well when it gets to a certain size, > >>>>>>>> cultures > >>>>>>>> can get diluted, building culture vs building process, etc. > >>>>>>>> > >>>>>>>> I also really like to have a more visible process for larger > >>>>>>>> changes, > >>>>>>>> especially major user facing API changes. Historically we upload > >>>>>>>> design docs > >>>>>>>> for major changes, but it is not always consistent and difficult > to > >>>>>>>> quality > >>>>>>>> of the docs, due to the volunteering nature of the organization. > >>>>>>>> > >>>>>>>> Some of the more concrete ideas we discussed focus on building a > >>>>>>>> culture > >>>>>>>> to improve clarity: > >>>>>>>> > >>>>>>>> - Process: Large changes should have design docs posted on JIRA. > One > >>>>>>>> thing > >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we > >>>>>>>> should > >>>>>>>> create a design doc template for the project and ask everybody to > >>>>>>>> follow. > >>>>>>>> The design doc template should also explicitly list goals and > >>>>>>>> non-goals, to > >>>>>>>> make design doc more consistent. > >>>>>>>> > >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with > >>>>>>>> some > >>>>>>>> changes, but again very inconsistent. Just posting something on > JIRA > >>>>>>>> isn't > >>>>>>>> sufficient, because there are simply too many JIRAs and the signal > >>>>>>>> get lost > >>>>>>>> in the noise. While this is generally impossible to enforce > because > >>>>>>>> we can't > >>>>>>>> force all volunteers to conform to a process (or they might not > even > >>>>>>>> be > >>>>>>>> aware of this), those who are more familiar with the project can > >>>>>>>> help by > >>>>>>>> emailing the dev@ when they see something that hasn't been. > >>>>>>>> > >>>>>>>> - Culture: The design doc author(s) should be open to feedback. A > >>>>>>>> design > >>>>>>>> doc should serve as the base for discussion and is by no means the > >>>>>>>> final > >>>>>>>> design. Of course, this does not mean the author has to accept > every > >>>>>>>> feedback. They should also be comfortable accepting / rejecting > >>>>>>>> ideas on > >>>>>>>> technical grounds. > >>>>>>>> > >>>>>>>> - Process / Culture: For major ongoing projects, it can be useful > to > >>>>>>>> have > >>>>>>>> some monthly Google hangouts that are open to the world. I am > >>>>>>>> actually not > >>>>>>>> sure how well this will work, because of the volunteering nature > and > >>>>>>>> we need > >>>>>>>> to adjust for timezones for people across the globe, but it seems > >>>>>>>> worth > >>>>>>>> trying. > >>>>>>>> > >>>>>>>> - Culture: Contributors (including committers) should be more > direct > >>>>>>>> in > >>>>>>>> setting expectations, including whether they are working on a > >>>>>>>> specific > >>>>>>>> issue, whether they will be working on a specific issue, and > whether > >>>>>>>> an > >>>>>>>> issue or pr or jira should be rejected. Most people I know in this > >>>>>>>> community > >>>>>>>> are nice and don't enjoy telling other people no, but it is often > >>>>>>>> more > >>>>>>>> annoying to a contributor to not know anything than getting a no. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia > >>>>>>>> <matei.zaha...@gmail.com> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal" > >>>>>>>>> process that > >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't > think > >>>>>>>>> committers are trying to minimize their own work -- every > committer > >>>>>>>>> cares > >>>>>>>>> about making the software useful for users. However, it is always > >>>>>>>>> hard to > >>>>>>>>> get user input and so it helps to have this kind of process. I've > >>>>>>>>> certainly > >>>>>>>>> looked at the *IPs a lot in other software I use just to see the > >>>>>>>>> biggest > >>>>>>>>> things on the roadmap. > >>>>>>>>> > >>>>>>>>> When you're talking about "changing interfaces", are you talking > >>>>>>>>> about > >>>>>>>>> public or internal APIs? I do think many people hate changing > >>>>>>>>> public APIs > >>>>>>>>> and I actually think that's for the best of the project. That's a > >>>>>>>>> technical > >>>>>>>>> debate, but basically, the worst thing when you're using a piece > of > >>>>>>>>> software > >>>>>>>>> is that the developers constantly ask you to rewrite your app to > >>>>>>>>> update to a > >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone > >>>>>>>>> who's used > >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code > >>>>>>>>> this > >>>>>>>>> release" model works well within a single large company, but > >>>>>>>>> doesn't work > >>>>>>>>> well for a community, which is why nearly all *very* widely used > >>>>>>>>> programming > >>>>>>>>> interfaces (I'm talking things like Java standard library, > Windows > >>>>>>>>> API, etc) > >>>>>>>>> almost *never* break backwards compatibility. All this is done > >>>>>>>>> within reason > >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x, > etc). > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> ------------------------------------------------------------ > --------- > >>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Stavros Kontopoulos > >>>>> Senior Software Engineer > >>>>> Lightbend, Inc. > >>>>> p: +30 6977967274 > >>>>> e: stavros.kontopou...@lightbend.com > >>>>> > >>>>> > >>>> > >>> > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >