Re: Spark Improvement Proposals

Cody Koeninger Sun, 09 Oct 2016 14:19:26 -0700

Regarding name, if the SIP overlap is a concern, we can pick a different name.
My tongue in cheek suggestion would be
Spark Lightweight Improvement process (SPARKLI)


On Sun, Oct 9, 2016 at 4:14 PM, Cody Koeninger <[email protected]> wrote:
> So to focus the discussion on the specific strategy I'm suggesting,
> documented at
>
> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>
> "Goals: What must this allow people to do, that they can't currently?"
>
> Is it unclear that this is focusing specifically on people-visible behavior?
>
> Rejected goals -  are important because otherwise people keep trying
> to argue about scope.  Of course you can change things later with a
> different SIP and different vote, the point is to focus.
>
> Use cases - are something that people are going to bring up in
> discussion.  If they aren't clearly documented as a goal ("This must
> allow me to connect using SSL"), they should be added.
>
> Internal architecture - if the people who need specific behavior are
> implementers of other parts of the system, that's fine.
>
> Rejected strategies - If you have none of these, you have no evidence
> that the proponent didn't just go with the first thing they had in
> mind (or have already implemented), which is a big problem currently.
> Approval isn't binding as to specifics of implementation, so these
> aren't handcuffs.  The goals are the contract, the strategy is
> evidence that contract can actually be met.
>
> Design docs - I'm not touching design docs.  The markdown file I
> linked specifically says of the strategy section "This is not a full
> design document."  Is this unclear?  Design docs can be worked on
> obviously, but that's not what I'm concerned with here.
>
>
>
>
> On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[email protected]> wrote:
>> Hi Cody,
>>
>> I think this would be a lot more concrete if we had a more detailed template
>> for SIPs. Right now, it's not super clear what's in scope -- e.g. are  they
>> a way to solicit feedback on the user-facing behavior or on the internals?
>> "Goals" can cover both things. I've been thinking of SIPs more as Product
>> Requirements Docs (PRDs), which focus on *what* a code change should do as
>> opposed to how.
>>
>> In particular, here are some things that you may or may not consider in
>> scope for SIPs:
>>
>> - Goals and non-goals: This is definitely in scope, and IMO should focus on
>> user-visible behavior (e.g. "system supports SQL window functions" or
>> "system continues working if one node fails"). BTW I wouldn't say "rejected
>> goals" because some of them might become goals later, so we're not
>> definitively rejecting them.
>>
>> - Public API: Probably should be included in most SIPs unless it's too large
>> to fully specify then (e.g. "let's add an ML library").
>>
>> - Use cases: I usually find this very useful in PRDs to better communicate
>> the goals.
>>
>> - Internal architecture: This is usually *not* a thing users can easily
>> comment on and it sounds more like a design doc item. Of course it's
>> important to show that the SIP is feasible to implement. One exception,
>> however, is that I think we'll have some SIPs primarily on internals (e.g.
>> if somebody wants to refactor Spark's query optimizer or something).
>>
>> - Rejected strategies: I personally wouldn't put this, because what's the
>> point of voting to reject a strategy before you've really begun designing
>> and implementing something? What if you discover that the strategy is
>> actually better when you start doing stuff?
>>
>> At a super high level, it depends on whether you want the SIPs to be PRDs
>> for getting some quick feedback on the goals of a feature before it is
>> designed, or something more like full-fledged design docs (just a more
>> visible design doc for bigger changes). I looked at Kafka's KIPs, and they
>> actually seem to be more like design docs. This can work too but it does
>> require more work from the proposer and it can lead to the same problems you
>> mentioned with people already having a design and implementation in mind.
>>
>> Basically, the question is, are you trying to iterate faster on design by
>> adding a step for user feedback earlier? Or are you just trying to make
>> design docs for key features more visible (and their approval more formal)?
>>
>> BTW note that in either case, I'd like to have a template for design docs
>> too, which should also include goals. I think that would've avoided some of
>> the issues you brought up.
>>
>> Matei
>>
>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[email protected]> wrote:
>>
>> Here's my specific proposal (meta-proposal?)
>>
>> Spark Improvement Proposals (SIP)
>>
>>
>> Background:
>>
>> The current problem is that design and implementation of large features are
>> often done in private, before soliciting user feedback.
>>
>> When feedback is solicited, it is often as to detailed design specifics, not
>> focused on goals.
>>
>> When implementation does take place after design, there is often
>> disagreement as to what goals are or are not in scope.
>>
>> This results in commits that don't fully meet user needs.
>>
>>
>> Goals:
>>
>> - Ensure user, contributor, and committer goals are clearly identified and
>> agreed upon, before implementation takes place.
>>
>> - Ensure that a technically feasible strategy is chosen that is likely to
>> meet the goals.
>>
>>
>> Rejected Goals:
>>
>> - SIPs are not for detailed design.  Design by committee doesn't work.
>>
>> - SIPs are not for every change.  We dont need that much process.
>>
>>
>> Strategy:
>>
>> My suggestion is outlined as a Spark Improvement Proposal process documented
>> at
>>
>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>
>> Specifics of Jira manipulation are an implementation detail we can figure
>> out.
>>
>> I'm suggesting voting; the need here is for a _clear_ outcome.
>>
>>
>> Rejected Strategies:
>>
>> Having someone who understands the problem implement it first works, but
>> only if significant iteration after user feedback is allowed.
>>
>> Historically this has been problematic due to pressure to limit public api
>> changes.
>>
>>
>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[email protected]> wrote:
>>>
>>> Alright looks like there are quite a bit of support. We should wait to
>>> hear from more people too.
>>>
>>> To push this forward, Cody and I will be working together in the next
>>> couple of weeks to come up with a concrete, detailed proposal on what this
>>> entails, and then we can discuss this the specific proposal as well.
>>>
>>>
>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[email protected]> wrote:
>>>>
>>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>>>> user-facing or cross-cutting changes, not minor feature adds.
>>>>
>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>>> <[email protected]> wrote:
>>>>>
>>>>> +1 to the SIP label as long as it does not slow down things and it
>>>>> targets optimizing efforts, coordination etc. For example really small
>>>>> features should not need to go through this process (assuming they dont
>>>>> touch public interfaces)  or re-factorings and hope it will be kept this
>>>>> way. So as a guideline doc should be provided, like in the KIP case.
>>>>>
>>>>> IMHO so far aside from tagging things and linking them elsewhere simply
>>>>> having design docs and prototypes implementations in PRs is not something
>>>>> that has not worked so far. What is really a pain in many projects out 
>>>>> there
>>>>> is discontinuity in progress of PRs, missing features, slow reviews which 
>>>>> is
>>>>> understandable to some extent... it is not only about Spark but things can
>>>>> be improved for sure for this project in particular as already stated.
>>>>>
>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> +1 to adding an SIP label and linking it from the website.  I think it
>>>>>> needs
>>>>>>
>>>>>> - template that focuses it towards soliciting user goals / non goals
>>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
>>>>>> recommend a vote.
>>>>>>
>>>>>> Matei asked me to clarify what I meant by changing interfaces, I think
>>>>>> it's directly relevant to the SIP idea so I'll clarify here, and split
>>>>>> a thread for the other discussion per Nicholas' request.
>>>>>>
>>>>>> I meant changing public user interfaces.  I think the first design is
>>>>>> unlikely to be right, because it's done at a time when you have the
>>>>>> least information.  As a user, I find it considerably more frustrating
>>>>>> to be unable to use a tool to get my job done, than I do having to
>>>>>> make minor changes to my code in order to take advantage of features.
>>>>>> I've seen committers be seriously reluctant to allow changes to
>>>>>> @experimental code that are needed in order for it to really work
>>>>>> right.  You need to be able to iterate, and if people on both sides of
>>>>>> the fence aren't going to respect that some newer apis are subject to
>>>>>> change, then why even mark them as such?
>>>>>>
>>>>>> Ideally a finished SIP should give me a checklist of things that an
>>>>>> implementation must do, and things that it doesn't need to do.
>>>>>> Contributors/committers should be seriously discouraged from putting
>>>>>> out a version 0.1 that doesn't have at least a prototype
>>>>>> implementation of all those things, especially if they're then going
>>>>>> to argue against interface changes necessary to get the the rest of
>>>>>> the things done in the 0.2 version.
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[email protected]>
>>>>>> wrote:
>>>>>> > I like the lightweight proposal to add a SIP label.
>>>>>> >
>>>>>> > During Spark 2.0 development, Tom (Graves) and I suggested using wiki
>>>>>> > to
>>>>>> > track the list of major changes, but that never really materialized
>>>>>> > due to
>>>>>> > the overhead. Adding a SIP label on major JIRAs and then link to them
>>>>>> > prominently on the Spark website makes a lot of sense.
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>>>>> > <[email protected]>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> For the improvement proposals, I think one major point was to make
>>>>>> >> them
>>>>>> >> really visible to users who are not contributors, so we should do
>>>>>> >> more than
>>>>>> >> sending stuff to dev@. One very lightweight idea is to have a new
>>>>>> >> type of
>>>>>> >> JIRA called a SIP and have a link to a filter that shows all such
>>>>>> >> JIRAs from
>>>>>> >> http://spark.apache.org. I also like the idea of SIP and design doc
>>>>>> >> templates (in fact many projects have them).
>>>>>> >>
>>>>>> >> Matei
>>>>>> >>
>>>>>> >> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[email protected]>
>>>>>> >> wrote:
>>>>>> >>
>>>>>> >> I called Cody last night and talked about some of the topics in his
>>>>>> >> email.
>>>>>> >> It became clear to me Cody genuinely cares about the project.
>>>>>> >>
>>>>>> >> Some of the frustrations come from the success of the project itself
>>>>>> >> becoming very "hot", and it is difficult to get clarity from people
>>>>>> >> who
>>>>>> >> don't dedicate all their time to Spark. In fact, it is in some ways
>>>>>> >> similar
>>>>>> >> to scaling an engineering team in a successful startup: old
>>>>>> >> processes that
>>>>>> >> worked well might not work so well when it gets to a certain size,
>>>>>> >> cultures
>>>>>> >> can get diluted, building culture vs building process, etc.
>>>>>> >>
>>>>>> >> I also really like to have a more visible process for larger
>>>>>> >> changes,
>>>>>> >> especially major user facing API changes. Historically we upload
>>>>>> >> design docs
>>>>>> >> for major changes, but it is not always consistent and difficult to
>>>>>> >> quality
>>>>>> >> of the docs, due to the volunteering nature of the organization.
>>>>>> >>
>>>>>> >> Some of the more concrete ideas we discussed focus on building a
>>>>>> >> culture
>>>>>> >> to improve clarity:
>>>>>> >>
>>>>>> >> - Process: Large changes should have design docs posted on JIRA. One
>>>>>> >> thing
>>>>>> >> Cody and I didn't discuss but an idea that just came to me is we
>>>>>> >> should
>>>>>> >> create a design doc template for the project and ask everybody to
>>>>>> >> follow.
>>>>>> >> The design doc template should also explicitly list goals and
>>>>>> >> non-goals, to
>>>>>> >> make design doc more consistent.
>>>>>> >>
>>>>>> >> - Process: Email dev@ to solicit feedback. We have some this with
>>>>>> >> some
>>>>>> >> changes, but again very inconsistent. Just posting something on JIRA
>>>>>> >> isn't
>>>>>> >> sufficient, because there are simply too many JIRAs and the signal
>>>>>> >> get lost
>>>>>> >> in the noise. While this is generally impossible to enforce because
>>>>>> >> we can't
>>>>>> >> force all volunteers to conform to a process (or they might not even
>>>>>> >> be
>>>>>> >> aware of this),  those who are more familiar with the project can
>>>>>> >> help by
>>>>>> >> emailing the dev@ when they see something that hasn't been.
>>>>>> >>
>>>>>> >> - Culture: The design doc author(s) should be open to feedback. A
>>>>>> >> design
>>>>>> >> doc should serve as the base for discussion and is by no means the
>>>>>> >> final
>>>>>> >> design. Of course, this does not mean the author has to accept every
>>>>>> >> feedback. They should also be comfortable accepting / rejecting
>>>>>> >> ideas on
>>>>>> >> technical grounds.
>>>>>> >>
>>>>>> >> - Process / Culture: For major ongoing projects, it can be useful to
>>>>>> >> have
>>>>>> >> some monthly Google hangouts that are open to the world. I am
>>>>>> >> actually not
>>>>>> >> sure how well this will work, because of the volunteering nature and
>>>>>> >> we need
>>>>>> >> to adjust for timezones for people across the globe, but it seems
>>>>>> >> worth
>>>>>> >> trying.
>>>>>> >>
>>>>>> >> - Culture: Contributors (including committers) should be more direct
>>>>>> >> in
>>>>>> >> setting expectations, including whether they are working on a
>>>>>> >> specific
>>>>>> >> issue, whether they will be working on a specific issue, and whether
>>>>>> >> an
>>>>>> >> issue or pr or jira should be rejected. Most people I know in this
>>>>>> >> community
>>>>>> >> are nice and don't enjoy telling other people no, but it is often
>>>>>> >> more
>>>>>> >> annoying to a contributor to not know anything than getting a no.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>>>>> >> <[email protected]>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Love the idea of a more visible "Spark Improvement Proposal"
>>>>>> >>> process that
>>>>>> >>> solicits user input on new APIs. For what it's worth, I don't think
>>>>>> >>> committers are trying to minimize their own work -- every committer
>>>>>> >>> cares
>>>>>> >>> about making the software useful for users. However, it is always
>>>>>> >>> hard to
>>>>>> >>> get user input and so it helps to have this kind of process. I've
>>>>>> >>> certainly
>>>>>> >>> looked at the *IPs a lot in other software I use just to see the
>>>>>> >>> biggest
>>>>>> >>> things on the roadmap.
>>>>>> >>>
>>>>>> >>> When you're talking about "changing interfaces", are you talking
>>>>>> >>> about
>>>>>> >>> public or internal APIs? I do think many people hate changing
>>>>>> >>> public APIs
>>>>>> >>> and I actually think that's for the best of the project. That's a
>>>>>> >>> technical
>>>>>> >>> debate, but basically, the worst thing when you're using a piece of
>>>>>> >>> software
>>>>>> >>> is that the developers constantly ask you to rewrite your app to
>>>>>> >>> update to a
>>>>>> >>> new version (and thus benefit from bug fixes, etc). Cue anyone
>>>>>> >>> who's used
>>>>>> >>> Protobuf, or Guava. The "let's get everyone to change their code
>>>>>> >>> this
>>>>>> >>> release" model works well within a single large company, but
>>>>>> >>> doesn't work
>>>>>> >>> well for a community, which is why nearly all *very* widely used
>>>>>> >>> programming
>>>>>> >>> interfaces (I'm talking things like Java standard library, Windows
>>>>>> >>> API, etc)
>>>>>> >>> almost *never* break backwards compatibility. All this is done
>>>>>> >>> within reason
>>>>>> >>> though, e.g. we do change things in major releases (2.x, 3.x, etc).
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Stavros Kontopoulos
>>>>> Senior Software Engineer
>>>>> Lightbend, Inc.
>>>>> p:  +30 6977967274
>>>>> e: [email protected]
>>>>>
>>>>>
>>>>
>>>
>>
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Spark Improvement Proposals

Reply via email to