Re: Spark Improvement Proposals

Ofir Manor Sun, 09 Oct 2016 15:08:11 -0700

This is a great discussion!
Maybe you could have a look at Kafka's process - it also uses Rejected
Alternatives and I personally find it very clear actually (the link also
leads to all KIPs):


https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
Cody - maybe you could take one of the open issues and write a sample
proposal? A concrete example might make it clearer for those who see this
for the first time. Maybe the Kafka offset discussion or some other
Kafka/Structured Streaming open issue? Will that be helpful?

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

On Mon, Oct 10, 2016 at 12:36 AM, Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> Yup, this is the stuff that I found unclear. Thanks for clarifying here,
> but we should also clarify it in the writeup. In particular:
>
> - Goals needs to be about user-facing behavior ("people" is broad)
>
> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up
> one of these and say "Spark's developers have officially rejected X, which
> our awesome system has".
>
> - For user-facing stuff, I think you need a section on API. Virtually all
> other *IPs I've seen have that.
>
> - I'm still not sure why the strategy section is needed if the purpose is
> to define user-facing behavior -- unless this is the strategy for setting
> the goals or for defining the API. That sounds squarely like a design doc
> issue. In some sense, who cares whether the proposal is technically
> feasible right now? If it's infeasible, that will be discovered later
> during design and implementation. Same thing with rejected strategies --
> listing some of those is definitely useful sometimes, but if you make this
> a *required* section, people are just going to fill it in with bogus stuff
> (I've seen this happen before).
>
> Matei
>
> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <c...@koeninger.org> wrote:
> >
> > So to focus the discussion on the specific strategy I'm suggesting,
> > documented at
> >
> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-
> improvement-proposals.md
> >
> > "Goals: What must this allow people to do, that they can't currently?"
> >
> > Is it unclear that this is focusing specifically on people-visible
> behavior?
> >
> > Rejected goals -  are important because otherwise people keep trying
> > to argue about scope.  Of course you can change things later with a
> > different SIP and different vote, the point is to focus.
> >
> > Use cases - are something that people are going to bring up in
> > discussion.  If they aren't clearly documented as a goal ("This must
> > allow me to connect using SSL"), they should be added.
> >
> > Internal architecture - if the people who need specific behavior are
> > implementers of other parts of the system, that's fine.
> >
> > Rejected strategies - If you have none of these, you have no evidence
> > that the proponent didn't just go with the first thing they had in
> > mind (or have already implemented), which is a big problem currently.
> > Approval isn't binding as to specifics of implementation, so these
> > aren't handcuffs.  The goals are the contract, the strategy is
> > evidence that contract can actually be met.
> >
> > Design docs - I'm not touching design docs.  The markdown file I
> > linked specifically says of the strategy section "This is not a full
> > design document."  Is this unclear?  Design docs can be worked on
> > obviously, but that's not what I'm concerned with here.
> >
> >
> >
> >
> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <matei.zaha...@gmail.com>
> wrote:
> >> Hi Cody,
> >>
> >> I think this would be a lot more concrete if we had a more detailed
> template
> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are
> they
> >> a way to solicit feedback on the user-facing behavior or on the
> internals?
> >> "Goals" can cover both things. I've been thinking of SIPs more as
> Product
> >> Requirements Docs (PRDs), which focus on *what* a code change should do
> as
> >> opposed to how.
> >>
> >> In particular, here are some things that you may or may not consider in
> >> scope for SIPs:
> >>
> >> - Goals and non-goals: This is definitely in scope, and IMO should
> focus on
> >> user-visible behavior (e.g. "system supports SQL window functions" or
> >> "system continues working if one node fails"). BTW I wouldn't say
> "rejected
> >> goals" because some of them might become goals later, so we're not
> >> definitively rejecting them.
> >>
> >> - Public API: Probably should be included in most SIPs unless it's too
> large
> >> to fully specify then (e.g. "let's add an ML library").
> >>
> >> - Use cases: I usually find this very useful in PRDs to better
> communicate
> >> the goals.
> >>
> >> - Internal architecture: This is usually *not* a thing users can easily
> >> comment on and it sounds more like a design doc item. Of course it's
> >> important to show that the SIP is feasible to implement. One exception,
> >> however, is that I think we'll have some SIPs primarily on internals
> (e.g.
> >> if somebody wants to refactor Spark's query optimizer or something).
> >>
> >> - Rejected strategies: I personally wouldn't put this, because what's
> the
> >> point of voting to reject a strategy before you've really begun
> designing
> >> and implementing something? What if you discover that the strategy is
> >> actually better when you start doing stuff?
> >>
> >> At a super high level, it depends on whether you want the SIPs to be
> PRDs
> >> for getting some quick feedback on the goals of a feature before it is
> >> designed, or something more like full-fledged design docs (just a more
> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and
> they
> >> actually seem to be more like design docs. This can work too but it does
> >> require more work from the proposer and it can lead to the same
> problems you
> >> mentioned with people already having a design and implementation in
> mind.
> >>
> >> Basically, the question is, are you trying to iterate faster on design
> by
> >> adding a step for user feedback earlier? Or are you just trying to make
> >> design docs for key features more visible (and their approval more
> formal)?
> >>
> >> BTW note that in either case, I'd like to have a template for design
> docs
> >> too, which should also include goals. I think that would've avoided
> some of
> >> the issues you brought up.
> >>
> >> Matei
> >>
> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <c...@koeninger.org> wrote:
> >>
> >> Here's my specific proposal (meta-proposal?)
> >>
> >> Spark Improvement Proposals (SIP)
> >>
> >>
> >> Background:
> >>
> >> The current problem is that design and implementation of large features
> are
> >> often done in private, before soliciting user feedback.
> >>
> >> When feedback is solicited, it is often as to detailed design
> specifics, not
> >> focused on goals.
> >>
> >> When implementation does take place after design, there is often
> >> disagreement as to what goals are or are not in scope.
> >>
> >> This results in commits that don't fully meet user needs.
> >>
> >>
> >> Goals:
> >>
> >> - Ensure user, contributor, and committer goals are clearly identified
> and
> >> agreed upon, before implementation takes place.
> >>
> >> - Ensure that a technically feasible strategy is chosen that is likely
> to
> >> meet the goals.
> >>
> >>
> >> Rejected Goals:
> >>
> >> - SIPs are not for detailed design.  Design by committee doesn't work.
> >>
> >> - SIPs are not for every change.  We dont need that much process.
> >>
> >>
> >> Strategy:
> >>
> >> My suggestion is outlined as a Spark Improvement Proposal process
> documented
> >> at
> >>
> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-
> improvement-proposals.md
> >>
> >> Specifics of Jira manipulation are an implementation detail we can
> figure
> >> out.
> >>
> >> I'm suggesting voting; the need here is for a _clear_ outcome.
> >>
> >>
> >> Rejected Strategies:
> >>
> >> Having someone who understands the problem implement it first works, but
> >> only if significant iteration after user feedback is allowed.
> >>
> >> Historically this has been problematic due to pressure to limit public
> api
> >> changes.
> >>
> >>
> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <r...@databricks.com>
> wrote:
> >>>
> >>> Alright looks like there are quite a bit of support. We should wait to
> >>> hear from more people too.
> >>>
> >>> To push this forward, Cody and I will be working together in the next
> >>> couple of weeks to come up with a concrete, detailed proposal on what
> this
> >>> entails, and then we can discuss this the specific proposal as well.
> >>>
> >>>
> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >>>>
> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
> >>>> user-facing or cross-cutting changes, not minor feature adds.
> >>>>
> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
> >>>> <stavros.kontopou...@lightbend.com> wrote:
> >>>>>
> >>>>> +1 to the SIP label as long as it does not slow down things and it
> >>>>> targets optimizing efforts, coordination etc. For example really
> small
> >>>>> features should not need to go through this process (assuming they
> dont
> >>>>> touch public interfaces)  or re-factorings and hope it will be kept
> this
> >>>>> way. So as a guideline doc should be provided, like in the KIP case.
> >>>>>
> >>>>> IMHO so far aside from tagging things and linking them elsewhere
> simply
> >>>>> having design docs and prototypes implementations in PRs is not
> something
> >>>>> that has not worked so far. What is really a pain in many projects
> out there
> >>>>> is discontinuity in progress of PRs, missing features, slow reviews
> which is
> >>>>> understandable to some extent... it is not only about Spark but
> things can
> >>>>> be improved for sure for this project in particular as already
> stated.
> >>>>>
> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org>
> >>>>> wrote:
> >>>>>>
> >>>>>> +1 to adding an SIP label and linking it from the website.  I think
> it
> >>>>>> needs
> >>>>>>
> >>>>>> - template that focuses it towards soliciting user goals / non goals
> >>>>>> - clear resolution as to which strategy was chosen to pursue.  I'd
> >>>>>> recommend a vote.
> >>>>>>
> >>>>>> Matei asked me to clarify what I meant by changing interfaces, I
> think
> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, and
> split
> >>>>>> a thread for the other discussion per Nicholas' request.
> >>>>>>
> >>>>>> I meant changing public user interfaces.  I think the first design
> is
> >>>>>> unlikely to be right, because it's done at a time when you have the
> >>>>>> least information.  As a user, I find it considerably more
> frustrating
> >>>>>> to be unable to use a tool to get my job done, than I do having to
> >>>>>> make minor changes to my code in order to take advantage of
> features.
> >>>>>> I've seen committers be seriously reluctant to allow changes to
> >>>>>> @experimental code that are needed in order for it to really work
> >>>>>> right.  You need to be able to iterate, and if people on both sides
> of
> >>>>>> the fence aren't going to respect that some newer apis are subject
> to
> >>>>>> change, then why even mark them as such?
> >>>>>>
> >>>>>> Ideally a finished SIP should give me a checklist of things that an
> >>>>>> implementation must do, and things that it doesn't need to do.
> >>>>>> Contributors/committers should be seriously discouraged from putting
> >>>>>> out a version 0.1 that doesn't have at least a prototype
> >>>>>> implementation of all those things, especially if they're then going
> >>>>>> to argue against interface changes necessary to get the the rest of
> >>>>>> the things done in the 0.2 version.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <r...@databricks.com>
> >>>>>> wrote:
> >>>>>>> I like the lightweight proposal to add a SIP label.
> >>>>>>>
> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested using
> wiki
> >>>>>>> to
> >>>>>>> track the list of major changes, but that never really materialized
> >>>>>>> due to
> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link to
> them
> >>>>>>> prominently on the Spark website makes a lot of sense.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
> >>>>>>> <matei.zaha...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> For the improvement proposals, I think one major point was to make
> >>>>>>>> them
> >>>>>>>> really visible to users who are not contributors, so we should do
> >>>>>>>> more than
> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have a new
> >>>>>>>> type of
> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all such
> >>>>>>>> JIRAs from
> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and design
> doc
> >>>>>>>> templates (in fact many projects have them).
> >>>>>>>>
> >>>>>>>> Matei
> >>>>>>>>
> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <r...@databricks.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I called Cody last night and talked about some of the topics in
> his
> >>>>>>>> email.
> >>>>>>>> It became clear to me Cody genuinely cares about the project.
> >>>>>>>>
> >>>>>>>> Some of the frustrations come from the success of the project
> itself
> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
> people
> >>>>>>>> who
> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in some
> ways
> >>>>>>>> similar
> >>>>>>>> to scaling an engineering team in a successful startup: old
> >>>>>>>> processes that
> >>>>>>>> worked well might not work so well when it gets to a certain size,
> >>>>>>>> cultures
> >>>>>>>> can get diluted, building culture vs building process, etc.
> >>>>>>>>
> >>>>>>>> I also really like to have a more visible process for larger
> >>>>>>>> changes,
> >>>>>>>> especially major user facing API changes. Historically we upload
> >>>>>>>> design docs
> >>>>>>>> for major changes, but it is not always consistent and difficult
> to
> >>>>>>>> quality
> >>>>>>>> of the docs, due to the volunteering nature of the organization.
> >>>>>>>>
> >>>>>>>> Some of the more concrete ideas we discussed focus on building a
> >>>>>>>> culture
> >>>>>>>> to improve clarity:
> >>>>>>>>
> >>>>>>>> - Process: Large changes should have design docs posted on JIRA.
> One
> >>>>>>>> thing
> >>>>>>>> Cody and I didn't discuss but an idea that just came to me is we
> >>>>>>>> should
> >>>>>>>> create a design doc template for the project and ask everybody to
> >>>>>>>> follow.
> >>>>>>>> The design doc template should also explicitly list goals and
> >>>>>>>> non-goals, to
> >>>>>>>> make design doc more consistent.
> >>>>>>>>
> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this with
> >>>>>>>> some
> >>>>>>>> changes, but again very inconsistent. Just posting something on
> JIRA
> >>>>>>>> isn't
> >>>>>>>> sufficient, because there are simply too many JIRAs and the signal
> >>>>>>>> get lost
> >>>>>>>> in the noise. While this is generally impossible to enforce
> because
> >>>>>>>> we can't
> >>>>>>>> force all volunteers to conform to a process (or they might not
> even
> >>>>>>>> be
> >>>>>>>> aware of this),  those who are more familiar with the project can
> >>>>>>>> help by
> >>>>>>>> emailing the dev@ when they see something that hasn't been.
> >>>>>>>>
> >>>>>>>> - Culture: The design doc author(s) should be open to feedback. A
> >>>>>>>> design
> >>>>>>>> doc should serve as the base for discussion and is by no means the
> >>>>>>>> final
> >>>>>>>> design. Of course, this does not mean the author has to accept
> every
> >>>>>>>> feedback. They should also be comfortable accepting / rejecting
> >>>>>>>> ideas on
> >>>>>>>> technical grounds.
> >>>>>>>>
> >>>>>>>> - Process / Culture: For major ongoing projects, it can be useful
> to
> >>>>>>>> have
> >>>>>>>> some monthly Google hangouts that are open to the world. I am
> >>>>>>>> actually not
> >>>>>>>> sure how well this will work, because of the volunteering nature
> and
> >>>>>>>> we need
> >>>>>>>> to adjust for timezones for people across the globe, but it seems
> >>>>>>>> worth
> >>>>>>>> trying.
> >>>>>>>>
> >>>>>>>> - Culture: Contributors (including committers) should be more
> direct
> >>>>>>>> in
> >>>>>>>> setting expectations, including whether they are working on a
> >>>>>>>> specific
> >>>>>>>> issue, whether they will be working on a specific issue, and
> whether
> >>>>>>>> an
> >>>>>>>> issue or pr or jira should be rejected. Most people I know in this
> >>>>>>>> community
> >>>>>>>> are nice and don't enjoy telling other people no, but it is often
> >>>>>>>> more
> >>>>>>>> annoying to a contributor to not know anything than getting a no.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
> >>>>>>>> <matei.zaha...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
> >>>>>>>>> process that
> >>>>>>>>> solicits user input on new APIs. For what it's worth, I don't
> think
> >>>>>>>>> committers are trying to minimize their own work -- every
> committer
> >>>>>>>>> cares
> >>>>>>>>> about making the software useful for users. However, it is always
> >>>>>>>>> hard to
> >>>>>>>>> get user input and so it helps to have this kind of process. I've
> >>>>>>>>> certainly
> >>>>>>>>> looked at the *IPs a lot in other software I use just to see the
> >>>>>>>>> biggest
> >>>>>>>>> things on the roadmap.
> >>>>>>>>>
> >>>>>>>>> When you're talking about "changing interfaces", are you talking
> >>>>>>>>> about
> >>>>>>>>> public or internal APIs? I do think many people hate changing
> >>>>>>>>> public APIs
> >>>>>>>>> and I actually think that's for the best of the project. That's a
> >>>>>>>>> technical
> >>>>>>>>> debate, but basically, the worst thing when you're using a piece
> of
> >>>>>>>>> software
> >>>>>>>>> is that the developers constantly ask you to rewrite your app to
> >>>>>>>>> update to a
> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue anyone
> >>>>>>>>> who's used
> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their code
> >>>>>>>>> this
> >>>>>>>>> release" model works well within a single large company, but
> >>>>>>>>> doesn't work
> >>>>>>>>> well for a community, which is why nearly all *very* widely used
> >>>>>>>>> programming
> >>>>>>>>> interfaces (I'm talking things like Java standard library,
> Windows
> >>>>>>>>> API, etc)
> >>>>>>>>> almost *never* break backwards compatibility. All this is done
> >>>>>>>>> within reason
> >>>>>>>>> though, e.g. we do change things in major releases (2.x, 3.x,
> etc).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> ------------------------------------------------------------
> ---------
> >>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Stavros Kontopoulos
> >>>>> Senior Software Engineer
> >>>>> Lightbend, Inc.
> >>>>> p:  +30 6977967274
> >>>>> e: stavros.kontopou...@lightbend.com
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Spark Improvement Proposals

Reply via email to