Regarding name, if the SIP overlap is a concern, we can pick a different name. My tongue in cheek suggestion would be Spark Lightweight Improvement process (SPARKLI)
On Sun, Oct 9, 2016 at 4:14 PM, Cody Koeninger <c...@koeninger.org> wrote: > So to focus the discussion on the specific strategy I'm suggesting, > documented at > > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md > > "Goals: What must this allow people to do, that they can't currently?" > > Is it unclear that this is focusing specifically on people-visible behavior? > > Rejected goals - are important because otherwise people keep trying > to argue about scope. Of course you can change things later with a > different SIP and different vote, the point is to focus. > > Use cases - are something that people are going to bring up in > discussion. If they aren't clearly documented as a goal ("This must > allow me to connect using SSL"), they should be added. > > Internal architecture - if the people who need specific behavior are > implementers of other parts of the system, that's fine. > > Rejected strategies - If you have none of these, you have no evidence > that the proponent didn't just go with the first thing they had in > mind (or have already implemented), which is a big problem currently. > Approval isn't binding as to specifics of implementation, so these > aren't handcuffs. The goals are the contract, the strategy is > evidence that contract can actually be met. > > Design docs - I'm not touching design docs. The markdown file I > linked specifically says of the strategy section "This is not a full > design document." Is this unclear? Design docs can be worked on > obviously, but that's not what I'm concerned with here. > > > > > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: >> Hi Cody, >> >> I think this would be a lot more concrete if we had a more detailed template >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are they >> a way to solicit feedback on the user-facing behavior or on the internals? >> "Goals" can cover both things. I've been thinking of SIPs more as Product >> Requirements Docs (PRDs), which focus on *what* a code change should do as >> opposed to how. >> >> In particular, here are some things that you may or may not consider in >> scope for SIPs: >> >> - Goals and non-goals: This is definitely in scope, and IMO should focus on >> user-visible behavior (e.g. "system supports SQL window functions" or >> "system continues working if one node fails"). BTW I wouldn't say "rejected >> goals" because some of them might become goals later, so we're not >> definitively rejecting them. >> >> - Public API: Probably should be included in most SIPs unless it's too large >> to fully specify then (e.g. "let's add an ML library"). >> >> - Use cases: I usually find this very useful in PRDs to better communicate >> the goals. >> >> - Internal architecture: This is usually *not* a thing users can easily >> comment on and it sounds more like a design doc item. Of course it's >> important to show that the SIP is feasible to implement. One exception, >> however, is that I think we'll have some SIPs primarily on internals (e.g. >> if somebody wants to refactor Spark's query optimizer or something). >> >> - Rejected strategies: I personally wouldn't put this, because what's the >> point of voting to reject a strategy before you've really begun designing >> and implementing something? What if you discover that the strategy is >> actually better when you start doing stuff? >> >> At a super high level, it depends on whether you want the SIPs to be PRDs >> for getting some quick feedback on the goals of a feature before it is >> designed, or something more like full-fledged design docs (just a more >> visible design doc for bigger changes). I looked at Kafka's KIPs, and they >> actually seem to be more like design docs. This can work too but it does >> require more work from the proposer and it can lead to the same problems you >> mentioned with people already having a design and implementation in mind. >> >> Basically, the question is, are you trying to iterate faster on design by >> adding a step for user feedback earlier? Or are you just trying to make >> design docs for key features more visible (and their approval more formal)? >> >> BTW note that in either case, I'd like to have a template for design docs >> too, which should also include goals. I think that would've avoided some of >> the issues you brought up. >> >> Matei >> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Here's my specific proposal (meta-proposal?) >> >> Spark Improvement Proposals (SIP) >> >> >> Background: >> >> The current problem is that design and implementation of large features are >> often done in private, before soliciting user feedback. >> >> When feedback is solicited, it is often as to detailed design specifics, not >> focused on goals. >> >> When implementation does take place after design, there is often >> disagreement as to what goals are or are not in scope. >> >> This results in commits that don't fully meet user needs. >> >> >> Goals: >> >> - Ensure user, contributor, and committer goals are clearly identified and >> agreed upon, before implementation takes place. >> >> - Ensure that a technically feasible strategy is chosen that is likely to >> meet the goals. >> >> >> Rejected Goals: >> >> - SIPs are not for detailed design. Design by committee doesn't work. >> >> - SIPs are not for every change. We dont need that much process. >> >> >> Strategy: >> >> My suggestion is outlined as a Spark Improvement Proposal process documented >> at >> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >> >> Specifics of Jira manipulation are an implementation detail we can figure >> out. >> >> I'm suggesting voting; the need here is for a _clear_ outcome. >> >> >> Rejected Strategies: >> >> Having someone who understands the problem implement it first works, but >> only if significant iteration after user feedback is allowed. >> >> Historically this has been problematic due to pressure to limit public api >> changes. >> >> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <r...@databricks.com> wrote: >>> >>> Alright looks like there are quite a bit of support. We should wait to >>> hear from more people too. >>> >>> To push this forward, Cody and I will be working together in the next >>> couple of weeks to come up with a concrete, detailed proposal on what this >>> entails, and then we can discuss this the specific proposal as well. >>> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> wrote: >>>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major >>>> user-facing or cross-cutting changes, not minor feature adds. >>>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos >>>> <stavros.kontopou...@lightbend.com> wrote: >>>>> >>>>> +1 to the SIP label as long as it does not slow down things and it >>>>> targets optimizing efforts, coordination etc. For example really small >>>>> features should not need to go through this process (assuming they dont >>>>> touch public interfaces) or re-factorings and hope it will be kept this >>>>> way. So as a guideline doc should be provided, like in the KIP case. >>>>> >>>>> IMHO so far aside from tagging things and linking them elsewhere simply >>>>> having design docs and prototypes implementations in PRs is not something >>>>> that has not worked so far. What is really a pain in many projects out >>>>> there >>>>> is discontinuity in progress of PRs, missing features, slow reviews which >>>>> is >>>>> understandable to some extent... it is not only about Spark but things can >>>>> be improved for sure for this project in particular as already stated. >>>>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org> >>>>> wrote: >>>>>> >>>>>> +1 to adding an SIP label and linking it from the website. I think it >>>>>> needs >>>>>> >>>>>> - template that focuses it towards soliciting user goals / non goals >>>>>> - clear resolution as to which strategy was chosen to pursue. I'd >>>>>> recommend a vote. >>>>>> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I think >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and split >>>>>> a thread for the other discussion per Nicholas' request. >>>>>> >>>>>> I meant changing public user interfaces. I think the first design is >>>>>> unlikely to be right, because it's done at a time when you have the >>>>>> least information. As a user, I find it considerably more frustrating >>>>>> to be unable to use a tool to get my job done, than I do having to >>>>>> make minor changes to my code in order to take advantage of features. >>>>>> I've seen committers be seriously reluctant to allow changes to >>>>>> @experimental code that are needed in order for it to really work >>>>>> right. You need to be able to iterate, and if people on both sides of >>>>>> the fence aren't going to respect that some newer apis are subject to >>>>>> change, then why even mark them as such? >>>>>> >>>>>> Ideally a finished SIP should give me a checklist of things that an >>>>>> implementation must do, and things that it doesn't need to do. >>>>>> Contributors/committers should be seriously discouraged from putting >>>>>> out a version 0.1 that doesn't have at least a prototype >>>>>> implementation of all those things, especially if they're then going >>>>>> to argue against interface changes necessary to get the the rest of >>>>>> the things done in the 0.2 version. >>>>>> >>>>>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com> >>>>>> wrote: >>>>>> > I like the lightweight proposal to add a SIP label. >>>>>> > >>>>>> > During Spark 2.0 development, Tom (Graves) and I suggested using wiki >>>>>> > to >>>>>> > track the list of major changes, but that never really materialized >>>>>> > due to >>>>>> > the overhead. Adding a SIP label on major JIRAs and then link to them >>>>>> > prominently on the Spark website makes a lot of sense. >>>>>> > >>>>>> > >>>>>> > On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia >>>>>> > <matei.zaha...@gmail.com> >>>>>> > wrote: >>>>>> >> >>>>>> >> For the improvement proposals, I think one major point was to make >>>>>> >> them >>>>>> >> really visible to users who are not contributors, so we should do >>>>>> >> more than >>>>>> >> sending stuff to dev@. One very lightweight idea is to have a new >>>>>> >> type of >>>>>> >> JIRA called a SIP and have a link to a filter that shows all such >>>>>> >> JIRAs from >>>>>> >> http://spark.apache.org. I also like the idea of SIP and design doc >>>>>> >> templates (in fact many projects have them). >>>>>> >> >>>>>> >> Matei >>>>>> >> >>>>>> >> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com> >>>>>> >> wrote: >>>>>> >> >>>>>> >> I called Cody last night and talked about some of the topics in his >>>>>> >> email. >>>>>> >> It became clear to me Cody genuinely cares about the project. >>>>>> >> >>>>>> >> Some of the frustrations come from the success of the project itself >>>>>> >> becoming very "hot", and it is difficult to get clarity from people >>>>>> >> who >>>>>> >> don't dedicate all their time to Spark. In fact, it is in some ways >>>>>> >> similar >>>>>> >> to scaling an engineering team in a successful startup: old >>>>>> >> processes that >>>>>> >> worked well might not work so well when it gets to a certain size, >>>>>> >> cultures >>>>>> >> can get diluted, building culture vs building process, etc. >>>>>> >> >>>>>> >> I also really like to have a more visible process for larger >>>>>> >> changes, >>>>>> >> especially major user facing API changes. Historically we upload >>>>>> >> design docs >>>>>> >> for major changes, but it is not always consistent and difficult to >>>>>> >> quality >>>>>> >> of the docs, due to the volunteering nature of the organization. >>>>>> >> >>>>>> >> Some of the more concrete ideas we discussed focus on building a >>>>>> >> culture >>>>>> >> to improve clarity: >>>>>> >> >>>>>> >> - Process: Large changes should have design docs posted on JIRA. One >>>>>> >> thing >>>>>> >> Cody and I didn't discuss but an idea that just came to me is we >>>>>> >> should >>>>>> >> create a design doc template for the project and ask everybody to >>>>>> >> follow. >>>>>> >> The design doc template should also explicitly list goals and >>>>>> >> non-goals, to >>>>>> >> make design doc more consistent. >>>>>> >> >>>>>> >> - Process: Email dev@ to solicit feedback. We have some this with >>>>>> >> some >>>>>> >> changes, but again very inconsistent. Just posting something on JIRA >>>>>> >> isn't >>>>>> >> sufficient, because there are simply too many JIRAs and the signal >>>>>> >> get lost >>>>>> >> in the noise. While this is generally impossible to enforce because >>>>>> >> we can't >>>>>> >> force all volunteers to conform to a process (or they might not even >>>>>> >> be >>>>>> >> aware of this), those who are more familiar with the project can >>>>>> >> help by >>>>>> >> emailing the dev@ when they see something that hasn't been. >>>>>> >> >>>>>> >> - Culture: The design doc author(s) should be open to feedback. A >>>>>> >> design >>>>>> >> doc should serve as the base for discussion and is by no means the >>>>>> >> final >>>>>> >> design. Of course, this does not mean the author has to accept every >>>>>> >> feedback. They should also be comfortable accepting / rejecting >>>>>> >> ideas on >>>>>> >> technical grounds. >>>>>> >> >>>>>> >> - Process / Culture: For major ongoing projects, it can be useful to >>>>>> >> have >>>>>> >> some monthly Google hangouts that are open to the world. I am >>>>>> >> actually not >>>>>> >> sure how well this will work, because of the volunteering nature and >>>>>> >> we need >>>>>> >> to adjust for timezones for people across the globe, but it seems >>>>>> >> worth >>>>>> >> trying. >>>>>> >> >>>>>> >> - Culture: Contributors (including committers) should be more direct >>>>>> >> in >>>>>> >> setting expectations, including whether they are working on a >>>>>> >> specific >>>>>> >> issue, whether they will be working on a specific issue, and whether >>>>>> >> an >>>>>> >> issue or pr or jira should be rejected. Most people I know in this >>>>>> >> community >>>>>> >> are nice and don't enjoy telling other people no, but it is often >>>>>> >> more >>>>>> >> annoying to a contributor to not know anything than getting a no. >>>>>> >> >>>>>> >> >>>>>> >> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia >>>>>> >> <matei.zaha...@gmail.com> >>>>>> >> wrote: >>>>>> >>> >>>>>> >>> >>>>>> >>> Love the idea of a more visible "Spark Improvement Proposal" >>>>>> >>> process that >>>>>> >>> solicits user input on new APIs. For what it's worth, I don't think >>>>>> >>> committers are trying to minimize their own work -- every committer >>>>>> >>> cares >>>>>> >>> about making the software useful for users. However, it is always >>>>>> >>> hard to >>>>>> >>> get user input and so it helps to have this kind of process. I've >>>>>> >>> certainly >>>>>> >>> looked at the *IPs a lot in other software I use just to see the >>>>>> >>> biggest >>>>>> >>> things on the roadmap. >>>>>> >>> >>>>>> >>> When you're talking about "changing interfaces", are you talking >>>>>> >>> about >>>>>> >>> public or internal APIs? I do think many people hate changing >>>>>> >>> public APIs >>>>>> >>> and I actually think that's for the best of the project. That's a >>>>>> >>> technical >>>>>> >>> debate, but basically, the worst thing when you're using a piece of >>>>>> >>> software >>>>>> >>> is that the developers constantly ask you to rewrite your app to >>>>>> >>> update to a >>>>>> >>> new version (and thus benefit from bug fixes, etc). Cue anyone >>>>>> >>> who's used >>>>>> >>> Protobuf, or Guava. The "let's get everyone to change their code >>>>>> >>> this >>>>>> >>> release" model works well within a single large company, but >>>>>> >>> doesn't work >>>>>> >>> well for a community, which is why nearly all *very* widely used >>>>>> >>> programming >>>>>> >>> interfaces (I'm talking things like Java standard library, Windows >>>>>> >>> API, etc) >>>>>> >>> almost *never* break backwards compatibility. All this is done >>>>>> >>> within reason >>>>>> >>> though, e.g. we do change things in major releases (2.x, 3.x, etc). >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> > >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Stavros Kontopoulos >>>>> Senior Software Engineer >>>>> Lightbend, Inc. >>>>> p: +30 6977967274 >>>>> e: stavros.kontopou...@lightbend.com >>>>> >>>>> >>>> >>> >> >> --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org