If there's confusion there, the document is specifically what I'm proposing. The email is just by way of introduction.
On Sun, Oct 9, 2016 at 3:47 PM, Nicholas Chammas <nicholas.cham...@gmail.com > wrote: > Oh, hmm… I guess I’m a little confused on the relation between Cody’s > email and the document he linked to, which says: > > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark- > improvement-proposals.md#when > > SIPs should be used for significant user-facing or cross-cutting changes, > not day-to-day improvements. When in doubt, if a committer thinks a change > needs an SIP, it does. > > Nick > > > On Sun, Oct 9, 2016 at 4:40 PM Matei Zaharia <matei.zaha...@gmail.com> > wrote: > >> Yup, but the example you gave is for alternatives about *user-facing >> behavior*, not implementation. The current SIP doc describes "strategy" >> more as implementation strategy. I'm just saying there are different >> possible goals for these types of docs. >> >> BTW, PEPs and Scala SIPs focus primarily on user-facing behavior, but >> also require a reference implementation. This is a bit different from what >> Cody had in mind, I think. >> >> >> Matei >> >> On Oct 9, 2016, at 1:25 PM, Nicholas Chammas <nicholas.cham...@gmail.com> >> wrote: >> >> >> - Rejected strategies: I personally wouldn’t put this, because what’s >> the point of voting to reject a strategy before you’ve really begun >> designing and implementing something? What if you discover that the >> strategy is actually better when you start doing stuff? >> >> I would guess the point is to document alternatives that were discussed >> and rejected, so that later on people can be pointed to that discussion and >> the devs don’t have to repeat themselves unnecessarily every time someone >> comes along and asks “Why didn’t you do this other thing?” That doesn’t >> mean a rejected proposal can’t later be revisited and the SIP can’t be >> updated. >> >> For reference from the Python community, PEP 492 >> <https://www.python.org/dev/peps/pep-0492/>, a Python Enhancement >> Proposal for adding async and await syntax and “first-class” coroutines >> to Python, has a section on rejected ideas >> <https://www.python.org/dev/peps/pep-0492/#why-async-def> for the new >> syntax. It captures a summary of what the devs discussed, but it doesn’t >> mean the PEP can’t be updated and a previously rejected proposal can’t be >> revived. >> >> At least in the Python community, a PEP serves not just as formal >> starting point for a proposal (the “real” starting point is usually a >> discussion on python-ideas or python-dev), but also as documentation of >> what was agreed on and a living “spec” of sorts. So PEPs sometimes get >> updated years after they are approved when revisions are agreed upon. PEPs >> are also intended for wide consumption, vs. bug tracker issues which the >> broader Python dev community are not expected to follow closely. >> >> Dunno if we want to follow a similar pattern for Spark, since the >> project’s needs are different. But the Python community has used PEPs to >> help organize and steer development since 2000; there are plenty of >> examples there we can probably take inspiration from. >> >> By the way, can we call these things something other than Spark >> Improvement Proposals? The acronym, SIP, conflicts with Scala SIPs >> <http://docs.scala-lang.org/sips/index.html>. Since the Scala and Spark >> communities have a lot of overlap, we don’t want, for example, names like >> “SIP-10” to have an ambiguous meaning. >> >> Nick >> >> >> On Sun, Oct 9, 2016 at 3:34 PM Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >> >>> Hi Cody, >>> >>> I think this would be a lot more concrete if we had a more detailed >>> template for SIPs. Right now, it's not super clear what's in scope -- e.g. >>> are they a way to solicit feedback on the user-facing behavior or on the >>> internals? "Goals" can cover both things. I've been thinking of SIPs more >>> as Product Requirements Docs (PRDs), which focus on *what* a code change >>> should do as opposed to how. >>> >>> In particular, here are some things that you may or may not consider in >>> scope for SIPs: >>> >>> - Goals and non-goals: This is definitely in scope, and IMO should focus >>> on user-visible behavior (e.g. "system supports SQL window functions" or >>> "system continues working if one node fails"). BTW I wouldn't say "rejected >>> goals" because some of them might become goals later, so we're not >>> definitively rejecting them. >>> >>> - Public API: Probably should be included in most SIPs unless it's too >>> large to fully specify then (e.g. "let's add an ML library"). >>> >>> - Use cases: I usually find this very useful in PRDs to better >>> communicate the goals. >>> >>> - Internal architecture: This is usually *not* a thing users can easily >>> comment on and it sounds more like a design doc item. Of course it's >>> important to show that the SIP is feasible to implement. One exception, >>> however, is that I think we'll have some SIPs primarily on internals (e.g. >>> if somebody wants to refactor Spark's query optimizer or something). >>> >>> - Rejected strategies: I personally wouldn't put this, because what's >>> the point of voting to reject a strategy before you've really begun >>> designing and implementing something? What if you discover that the >>> strategy is actually better when you start doing stuff? >>> >>> At a super high level, it depends on whether you want the SIPs to be >>> PRDs for getting some quick feedback on the goals of a feature before it is >>> designed, or something more like full-fledged design docs (just a more >>> visible design doc for bigger changes). I looked at Kafka's KIPs, and they >>> actually seem to be more like design docs. This can work too but it does >>> require more work from the proposer and it can lead to the same problems >>> you mentioned with people already having a design and implementation in >>> mind. >>> >>> Basically, the question is, are you trying to iterate faster on design >>> by adding a step for user feedback earlier? Or are you just trying to make >>> design docs for key features more visible (and their approval more formal)? >>> >>> BTW note that in either case, I'd like to have a template for design >>> docs too, which should also include goals. I think that would've avoided >>> some of the issues you brought up. >>> >>> Matei >>> >>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <c...@koeninger.org> wrote: >>> >>> Here's my specific proposal (meta-proposal?) >>> >>> Spark Improvement Proposals (SIP) >>> >>> >>> Background: >>> >>> The current problem is that design and implementation of large features >>> are often done in private, before soliciting user feedback. >>> >>> When feedback is solicited, it is often as to detailed design specifics, >>> not focused on goals. >>> >>> When implementation does take place after design, there is often >>> disagreement as to what goals are or are not in scope. >>> >>> This results in commits that don't fully meet user needs. >>> >>> >>> Goals: >>> >>> - Ensure user, contributor, and committer goals are clearly identified >>> and agreed upon, before implementation takes place. >>> >>> - Ensure that a technically feasible strategy is chosen that is likely >>> to meet the goals. >>> >>> >>> Rejected Goals: >>> >>> - SIPs are not for detailed design. Design by committee doesn't work. >>> >>> - SIPs are not for every change. We dont need that much process. >>> >>> >>> Strategy: >>> >>> My suggestion is outlined as a Spark Improvement Proposal process >>> documented at >>> >>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark- >>> improvement-proposals.md >>> >>> Specifics of Jira manipulation are an implementation detail we can >>> figure out. >>> >>> I'm suggesting voting; the need here is for a _clear_ outcome. >>> >>> >>> Rejected Strategies: >>> >>> Having someone who understands the problem implement it first works, but >>> only if significant iteration after user feedback is allowed. >>> >>> Historically this has been problematic due to pressure to limit public >>> api changes. >>> >>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <r...@databricks.com> wrote: >>> >>>> Alright looks like there are quite a bit of support. We should wait to >>>> hear from more people too. >>>> >>>> To push this forward, Cody and I will be working together in the next >>>> couple of weeks to come up with a concrete, detailed proposal on what this >>>> entails, and then we can discuss this the specific proposal as well. >>>> >>>> >>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> >>>> wrote: >>>> >>>>> Yeah, in case it wasn't clear, I was talking about SIPs for major >>>>> user-facing or cross-cutting changes, not minor feature adds. >>>>> >>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos < >>>>> stavros.kontopou...@lightbend.com> wrote: >>>>> >>>>>> +1 to the SIP label as long as it does not slow down things and it >>>>>> targets optimizing efforts, coordination etc. For example really small >>>>>> features should not need to go through this process (assuming they dont >>>>>> touch public interfaces) or re-factorings and hope it will be kept this >>>>>> way. So as a guideline doc should be provided, like in the KIP case. >>>>>> >>>>>> IMHO so far aside from tagging things and linking them elsewhere >>>>>> simply having design docs and prototypes implementations in PRs is not >>>>>> something that has not worked so far. What is really a pain in many >>>>>> projects out there is discontinuity in progress of PRs, missing features, >>>>>> slow reviews which is understandable to some extent... it is not only >>>>>> about >>>>>> Spark but things can be improved for sure for this project in particular >>>>>> as >>>>>> already stated. >>>>>> >>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org> >>>>>> wrote: >>>>>> >>>>>>> +1 to adding an SIP label and linking it from the website. I think >>>>>>> it needs >>>>>>> >>>>>>> - template that focuses it towards soliciting user goals / non goals >>>>>>> - clear resolution as to which strategy was chosen to pursue. I'd >>>>>>> recommend a vote. >>>>>>> >>>>>>> Matei asked me to clarify what I meant by changing interfaces, I >>>>>>> think >>>>>>> it's directly relevant to the SIP idea so I'll clarify here, and >>>>>>> split >>>>>>> a thread for the other discussion per Nicholas' request. >>>>>>> >>>>>>> I meant changing public user interfaces. I think the first design is >>>>>>> unlikely to be right, because it's done at a time when you have the >>>>>>> least information. As a user, I find it considerably more >>>>>>> frustrating >>>>>>> to be unable to use a tool to get my job done, than I do having to >>>>>>> make minor changes to my code in order to take advantage of features. >>>>>>> I've seen committers be seriously reluctant to allow changes to >>>>>>> @experimental code that are needed in order for it to really work >>>>>>> right. You need to be able to iterate, and if people on both sides >>>>>>> of >>>>>>> the fence aren't going to respect that some newer apis are subject to >>>>>>> change, then why even mark them as such? >>>>>>> >>>>>>> Ideally a finished SIP should give me a checklist of things that an >>>>>>> implementation must do, and things that it doesn't need to do. >>>>>>> Contributors/committers should be seriously discouraged from putting >>>>>>> out a version 0.1 that doesn't have at least a prototype >>>>>>> implementation of all those things, especially if they're then going >>>>>>> to argue against interface changes necessary to get the the rest of >>>>>>> the things done in the 0.2 version. >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com> >>>>>>> wrote: >>>>>>> > I like the lightweight proposal to add a SIP label. >>>>>>> > >>>>>>> > During Spark 2.0 development, Tom (Graves) and I suggested using >>>>>>> wiki to >>>>>>> > track the list of major changes, but that never really >>>>>>> materialized due to >>>>>>> > the overhead. Adding a SIP label on major JIRAs and then link to >>>>>>> them >>>>>>> > prominently on the Spark website makes a lot of sense. >>>>>>> > >>>>>>> > >>>>>>> > On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia < >>>>>>> matei.zaha...@gmail.com> >>>>>>> > wrote: >>>>>>> >> >>>>>>> >> For the improvement proposals, I think one major point was to >>>>>>> make them >>>>>>> >> really visible to users who are not contributors, so we should do >>>>>>> more than >>>>>>> >> sending stuff to dev@. One very lightweight idea is to have a >>>>>>> new type of >>>>>>> >> JIRA called a SIP and have a link to a filter that shows all such >>>>>>> JIRAs from >>>>>>> >> http://spark.apache.org. I also like the idea of SIP and design >>>>>>> doc >>>>>>> >> templates (in fact many projects have them). >>>>>>> >> >>>>>>> >> Matei >>>>>>> >> >>>>>>> >> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> I called Cody last night and talked about some of the topics in >>>>>>> his email. >>>>>>> >> It became clear to me Cody genuinely cares about the project. >>>>>>> >> >>>>>>> >> Some of the frustrations come from the success of the project >>>>>>> itself >>>>>>> >> becoming very "hot", and it is difficult to get clarity from >>>>>>> people who >>>>>>> >> don't dedicate all their time to Spark. In fact, it is in some >>>>>>> ways similar >>>>>>> >> to scaling an engineering team in a successful startup: old >>>>>>> processes that >>>>>>> >> worked well might not work so well when it gets to a certain >>>>>>> size, cultures >>>>>>> >> can get diluted, building culture vs building process, etc. >>>>>>> >> >>>>>>> >> I also really like to have a more visible process for larger >>>>>>> changes, >>>>>>> >> especially major user facing API changes. Historically we upload >>>>>>> design docs >>>>>>> >> for major changes, but it is not always consistent and difficult >>>>>>> to quality >>>>>>> >> of the docs, due to the volunteering nature of the organization. >>>>>>> >> >>>>>>> >> Some of the more concrete ideas we discussed focus on building a >>>>>>> culture >>>>>>> >> to improve clarity: >>>>>>> >> >>>>>>> >> - Process: Large changes should have design docs posted on JIRA. >>>>>>> One thing >>>>>>> >> Cody and I didn't discuss but an idea that just came to me is we >>>>>>> should >>>>>>> >> create a design doc template for the project and ask everybody to >>>>>>> follow. >>>>>>> >> The design doc template should also explicitly list goals and >>>>>>> non-goals, to >>>>>>> >> make design doc more consistent. >>>>>>> >> >>>>>>> >> - Process: Email dev@ to solicit feedback. We have some this >>>>>>> with some >>>>>>> >> changes, but again very inconsistent. Just posting something on >>>>>>> JIRA isn't >>>>>>> >> sufficient, because there are simply too many JIRAs and the >>>>>>> signal get lost >>>>>>> >> in the noise. While this is generally impossible to enforce >>>>>>> because we can't >>>>>>> >> force all volunteers to conform to a process (or they might not >>>>>>> even be >>>>>>> >> aware of this), those who are more familiar with the project can >>>>>>> help by >>>>>>> >> emailing the dev@ when they see something that hasn't been. >>>>>>> >> >>>>>>> >> - Culture: The design doc author(s) should be open to feedback. A >>>>>>> design >>>>>>> >> doc should serve as the base for discussion and is by no means >>>>>>> the final >>>>>>> >> design. Of course, this does not mean the author has to accept >>>>>>> every >>>>>>> >> feedback. They should also be comfortable accepting / rejecting >>>>>>> ideas on >>>>>>> >> technical grounds. >>>>>>> >> >>>>>>> >> - Process / Culture: For major ongoing projects, it can be useful >>>>>>> to have >>>>>>> >> some monthly Google hangouts that are open to the world. I am >>>>>>> actually not >>>>>>> >> sure how well this will work, because of the volunteering nature >>>>>>> and we need >>>>>>> >> to adjust for timezones for people across the globe, but it seems >>>>>>> worth >>>>>>> >> trying. >>>>>>> >> >>>>>>> >> - Culture: Contributors (including committers) should be more >>>>>>> direct in >>>>>>> >> setting expectations, including whether they are working on a >>>>>>> specific >>>>>>> >> issue, whether they will be working on a specific issue, and >>>>>>> whether an >>>>>>> >> issue or pr or jira should be rejected. Most people I know in >>>>>>> this community >>>>>>> >> are nice and don't enjoy telling other people no, but it is often >>>>>>> more >>>>>>> >> annoying to a contributor to not know anything than getting a no. >>>>>>> >> >>>>>>> >> >>>>>>> >> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia < >>>>>>> matei.zaha...@gmail.com> >>>>>>> >> wrote: >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Love the idea of a more visible "Spark Improvement Proposal" >>>>>>> process that >>>>>>> >>> solicits user input on new APIs. For what it's worth, I don't >>>>>>> think >>>>>>> >>> committers are trying to minimize their own work -- every >>>>>>> committer cares >>>>>>> >>> about making the software useful for users. However, it is >>>>>>> always hard to >>>>>>> >>> get user input and so it helps to have this kind of process. >>>>>>> I've certainly >>>>>>> >>> looked at the *IPs a lot in other software I use just to see the >>>>>>> biggest >>>>>>> >>> things on the roadmap. >>>>>>> >>> >>>>>>> >>> When you're talking about "changing interfaces", are you talking >>>>>>> about >>>>>>> >>> public or internal APIs? I do think many people hate changing >>>>>>> public APIs >>>>>>> >>> and I actually think that's for the best of the project. That's >>>>>>> a technical >>>>>>> >>> debate, but basically, the worst thing when you're using a piece >>>>>>> of software >>>>>>> >>> is that the developers constantly ask you to rewrite your app to >>>>>>> update to a >>>>>>> >>> new version (and thus benefit from bug fixes, etc). Cue anyone >>>>>>> who's used >>>>>>> >>> Protobuf, or Guava. The "let's get everyone to change their code >>>>>>> this >>>>>>> >>> release" model works well within a single large company, but >>>>>>> doesn't work >>>>>>> >>> well for a community, which is why nearly all *very* widely used >>>>>>> programming >>>>>>> >>> interfaces (I'm talking things like Java standard library, >>>>>>> Windows API, etc) >>>>>>> >>> almost *never* break backwards compatibility. All this is done >>>>>>> within reason >>>>>>> >>> though, e.g. we do change things in major releases (2.x, 3.x, >>>>>>> etc). >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> > >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> --------- >>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Stavros Kontopoulos >>>>>> >>>>>> *Senior Software Engineer* >>>>>> *Lightbend, Inc.* >>>>>> >>>>>> *p: +30 6977967274 <%2B1%20650%20678%200020>* >>>>>> *e: stavros.kontopou...@lightbend.com* <dave.mar...@lightbend.com> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>