Re: Spark Improvement Proposals

Mark Hamstra Mon, 10 Oct 2016 15:37:02 -0700

If I'm correctly understanding the kind of voting that you are talking
about, then to be accurate, it is only the PMC members that have a vote,
not all committers:
https://www.apache.org/foundation/how-it-works.html#pmc-members


On Mon, Oct 10, 2016 at 12:02 PM, Cody Koeninger <c...@koeninger.org> wrote:

> I think the main value is in being honest about what's going on.  No
> one other than committers can cast a meaningful vote, that's the
> reality.  Beyond that, if people think it's more open to allow formal
> proposals from anyone, I'm not necessarily against it, but my main
> question would be this:
>
> If anyone can submit a proposal, are committers actually going to
> clearly reject and close proposals that don't meet the requirements?
>
> Right now we have a serious problem with lack of clarity regarding
> contributions, and that cannot spill over into goal-setting.
>
> On Mon, Oct 10, 2016 at 1:54 PM, Ryan Blue <rb...@netflix.com> wrote:
> > +1 to votes to approve proposals. I agree that proposals should have an
> > official mechanism to be accepted, and a vote is an established means of
> > doing that well. I like that it includes a period to review the proposal
> and
> > I think proposals should have been discussed enough ahead of a vote to
> > survive the possibility of a veto.
> >
> > I also like the names that are short and (mostly) unique, like SEP.
> >
> > Where I disagree is with the requirement that a committer must formally
> > propose an enhancement. I don't see the value of restricting this: if
> > someone has the will to write up a proposal then they should be
> encouraged
> > to do so and start a discussion about it. Even if there is a political
> > reality as Cody says, what is the value of codifying that in our
> process? I
> > think restricting who can submit proposals would only undermine them by
> > pushing contributors out. Maybe I'm missing something here?
> >
> > rb
> >
> >
> >
> > On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >>
> >> Yes, users suggesting SIPs is a good thing and is explicitly called
> >> out in the linked document under the Who? section.  Formally proposing
> >> them, not so much, because of the political realities.
> >>
> >> Yes, implementation strategy definitely affects goals.  There are all
> >> kinds of examples of this, I'll pick one that's my fault so as to
> >> avoid sounding like I'm blaming:
> >>
> >> When I implemented the Kafka DStream, one of my (not explicitly agreed
> >> upon by the community) goals was to make sure people could use the
> >> Dstream with however they were already using Kafka at work.  The lack
> >> of explicit agreement on that goal led to all kinds of fighting with
> >> committers, that could have been avoided.  The lack of explicit
> >> up-front strategy discussion led to the DStream not really working
> >> with compacted topics.  I knew about compacted topics, but don't have
> >> a use for them, so had a blind spot there.  If there was explicit
> >> up-front discussion that my strategy was "assume that batches can be
> >> defined on the driver solely by beginning and ending offsets", there's
> >> a greater chance that a user would have seen that and said, "hey, what
> >> about non-contiguous offsets in a compacted topic".
> >>
> >> This kind of thing is only going to happen smoothly if we have a
> >> lightweight user-visible process with clear outcomes.
> >>
> >> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
> >> <assaf.mendel...@rsa.com> wrote:
> >> > I agree with most of what Cody said.
> >> >
> >> > Two things:
> >> >
> >> > First we can always have other people suggest SIPs but mark them as
> >> > “unreviewed” and have committers basically move them forward. The
> >> > problem is
> >> > that writing a good document takes time. This way we can leverage non
> >> > committers to do some of this work (it is just another way to
> >> > contribute).
> >> >
> >> >
> >> >
> >> > As for strategy, in many cases implementation strategy can affect the
> >> > goals.
> >> > I will give  a small example: In the current structured streaming
> >> > strategy,
> >> > we group by the time to achieve a sliding window. This is definitely
> an
> >> > implementation decision and not a goal. However, I can think of
> several
> >> > aggregation functions which have the time inside their calculation
> >> > buffer.
> >> > For example, let’s say we want to return a set of all distinct values.
> >> > One
> >> > way to implement this would be to make the set into a map and have the
> >> > value
> >> > contain the last time seen. Multiplying it across the groupby would
> cost
> >> > a
> >> > lot in performance. So adding such a strategy would have a great
> effect
> >> > on
> >> > the type of aggregations and their performance which does affect the
> >> > goal.
> >> > Without adding the strategy, it is easy for whoever goes to the design
> >> > document to not think about these cases. Furthermore, it might be
> >> > decided
> >> > that these cases are rare enough so that the strategy is still good
> >> > enough
> >> > but how would we know it without user feedback?
> >> >
> >> > I believe this example is exactly what Cody was talking about. Since
> >> > many
> >> > times implementation strategies have a large effect on the goal, we
> >> > should
> >> > have it discussed when discussing the goals. In addition, while it is
> >> > often
> >> > easy to throw out completely infeasible goals, it is often much harder
> >> > to
> >> > figure out that the goals are unfeasible without fine tuning.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Assaf.
> >> >
> >> >
> >> >
> >> > From: Cody Koeninger-2 [via Apache Spark Developers List]
> >> > [mailto:ml-node+[hidden email]]
> >> > Sent: Monday, October 10, 2016 2:25 AM
> >> > To: Mendelson, Assaf
> >> > Subject: Re: Spark Improvement Proposals
> >> >
> >> >
> >> >
> >> > Only committers should formally submit SIPs because in an apache
> >> > project only commiters have explicit political power.  If a user can't
> >> > find a commiter willing to sponsor an SIP idea, they have no way to
> >> > get the idea passed in any case.  If I can't find a committer to
> >> > sponsor this meta-SIP idea, I'm out of luck.
> >> >
> >> > I do not believe unrealistic goals can be found solely by inspection.
> >> > We've managed to ignore unrealistic goals even after implementation!
> >> > Focusing on APIs can allow people to think they've solved something,
> >> > when there's really no way of implementing that API while meeting the
> >> > goals.  Rapid iteration is clearly the best way to address this, but
> >> > we've already talked about why that hasn't really worked.  If adding a
> >> > non-binding API section to the template is important to you, I'm not
> >> > against it, but I don't think it's sufficient.
> >> >
> >> > On your PRD vs design doc spectrum, I'm saying this is closer to a
> >> > PRD.  Clear agreement on goals is the most important thing and that's
> >> > why it's the thing I want binding agreement on.  But I cannot agree to
> >> > goals unless I have enough minimal technical info to judge whether the
> >> > goals are likely to actually be accomplished.
> >> >
> >> >
> >> >
> >> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:
> >> >
> >> >
> >> >> Well, I think there are a few things here that don't make sense.
> First,
> >> >> why
> >> >> should only committers submit SIPs? Development in the project should
> >> >> be
> >> >> open to all contributors, whether they're committers or not. Second,
> I
> >> >> think
> >> >> unrealistic goals can be found just by inspecting the goals, and I'm
> >> >> not
> >> >> super worried that we'll accept a lot of SIPs that are then
> infeasible
> >> >> --
> >> >> we
> >> >> can then submit new ones. But this depends on whether you want this
> >> >> process
> >> >> to be a "design doc lite", where people also agree on implementation
> >> >> strategy, or just a way to agree on goals. This is what I asked
> earlier
> >> >> about PRDs vs design docs (and I'm open to either one but I'd just
> like
> >> >> clarity). Finally, both as a user and designer of software, I always
> >> >> want
> >> >> to
> >> >> give feedback on APIs, so I'd really like a culture of having those
> >> >> early.
> >> >> People don't argue about prettiness when they discuss APIs, they
> argue
> >> >> about
> >> >> the core concepts to expose in order to meet various goals, and then
> >> >> they're
> >> >> stuck maintaining those for a long time.
> >> >>
> >> >> Matei
> >> >>
> >> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
> >> >>
> >> >> Users instead of people, sure.  Commiters and contributors are (or at
> >> >> least
> >> >> should be) a subset of users.
> >> >>
> >> >> Non goals, sure. I don't care what the name is, but we need to
> clearly
> >> >> say
> >> >> e.g. 'no we are not maintaining compatibility with XYZ right now'.
> >> >>
> >> >> API, what I care most about is whether it allows me to accomplish the
> >> >> goals.
> >> >> Arguing about how ugly or pretty it is can be saved for design/
> >> >> implementation imho.
> >> >>
> >> >> Strategy, this is necessary because otherwise goals can be out of
> line
> >> >> with
> >> >> reality.  Don't propose goals you don't have at least some idea of
> how
> >> >> to
> >> >> implement.
> >> >>
> >> >> Rejected strategies, given that commiters are the only ones I'm
> saying
> >> >> should formally submit SPARKLIs or SIPs, if they put junk in a
> required
> >> >> section then slap them down for it and tell them to fix it.
> >> >>
> >> >>
> >> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
> >> >>>
> >> >>> Yup, this is the stuff that I found unclear. Thanks for clarifying
> >> >>> here,
> >> >>> but we should also clarify it in the writeup. In particular:
> >> >>>
> >> >>> - Goals needs to be about user-facing behavior ("people" is broad)
> >> >>>
> >> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig
> >> >>> up
> >> >>> one of these and say "Spark's developers have officially rejected X,
> >> >>> which
> >> >>> our awesome system has".
> >> >>>
> >> >>> - For user-facing stuff, I think you need a section on API.
> Virtually
> >> >>> all
> >> >>> other *IPs I've seen have that.
> >> >>>
> >> >>> - I'm still not sure why the strategy section is needed if the
> purpose
> >> >>> is
> >> >>> to define user-facing behavior -- unless this is the strategy for
> >> >>> setting
> >> >>> the goals or for defining the API. That sounds squarely like a
> design
> >> >>> doc
> >> >>> issue. In some sense, who cares whether the proposal is technically
> >> >>> feasible
> >> >>> right now? If it's infeasible, that will be discovered later during
> >> >>> design
> >> >>> and implementation. Same thing with rejected strategies -- listing
> >> >>> some
> >> >>> of
> >> >>> those is definitely useful sometimes, but if you make this a
> >> >>> *required*
> >> >>> section, people are just going to fill it in with bogus stuff (I've
> >> >>> seen
> >> >>> this happen before).
> >> >>>
> >> >>> Matei
> >> >>>
> >> >
> >> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
> >> >>> >
> >> >>> > So to focus the discussion on the specific strategy I'm
> suggesting,
> >> >>> > documented at
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-
> improvement-proposals.md
> >> >>> >
> >> >>> > "Goals: What must this allow people to do, that they can't
> >> >>> > currently?"
> >> >>> >
> >> >>> > Is it unclear that this is focusing specifically on people-visible
> >> >>> > behavior?
> >> >>> >
> >> >>> > Rejected goals -  are important because otherwise people keep
> trying
> >> >>> > to argue about scope.  Of course you can change things later with
> a
> >> >>> > different SIP and different vote, the point is to focus.
> >> >>> >
> >> >>> > Use cases - are something that people are going to bring up in
> >> >>> > discussion.  If they aren't clearly documented as a goal ("This
> must
> >> >>> > allow me to connect using SSL"), they should be added.
> >> >>> >
> >> >>> > Internal architecture - if the people who need specific behavior
> are
> >> >>> > implementers of other parts of the system, that's fine.
> >> >>> >
> >> >>> > Rejected strategies - If you have none of these, you have no
> >> >>> > evidence
> >> >>> > that the proponent didn't just go with the first thing they had in
> >> >>> > mind (or have already implemented), which is a big problem
> >> >>> > currently.
> >> >>> > Approval isn't binding as to specifics of implementation, so these
> >> >>> > aren't handcuffs.  The goals are the contract, the strategy is
> >> >>> > evidence that contract can actually be met.
> >> >>> >
> >> >>> > Design docs - I'm not touching design docs.  The markdown file I
> >> >>> > linked specifically says of the strategy section "This is not a
> full
> >> >>> > design document."  Is this unclear?  Design docs can be worked on
> >> >>> > obviously, but that's not what I'm concerned with here.
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
> >> >>> > wrote:
> >> >>> >> Hi Cody,
> >> >>> >>
> >> >>> >> I think this would be a lot more concrete if we had a more
> detailed
> >> >>> >> template
> >> >>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g.
> >> >>> >> are
> >> >>> >> they
> >> >>> >> a way to solicit feedback on the user-facing behavior or on the
> >> >>> >> internals?
> >> >>> >> "Goals" can cover both things. I've been thinking of SIPs more as
> >> >>> >> Product
> >> >>> >> Requirements Docs (PRDs), which focus on *what* a code change
> >> >>> >> should
> >> >>> >> do
> >> >>> >> as
> >> >>> >> opposed to how.
> >> >>> >>
> >> >>> >> In particular, here are some things that you may or may not
> >> >>> >> consider
> >> >>> >> in
> >> >>> >> scope for SIPs:
> >> >>> >>
> >> >>> >> - Goals and non-goals: This is definitely in scope, and IMO
> should
> >> >>> >> focus on
> >> >>> >> user-visible behavior (e.g. "system supports SQL window
> functions"
> >> >>> >> or
> >> >>> >> "system continues working if one node fails"). BTW I wouldn't say
> >> >>> >> "rejected
> >> >>> >> goals" because some of them might become goals later, so we're
> not
> >> >>> >> definitively rejecting them.
> >> >>> >>
> >> >>> >> - Public API: Probably should be included in most SIPs unless
> it's
> >> >>> >> too
> >> >>> >> large
> >> >>> >> to fully specify then (e.g. "let's add an ML library").
> >> >>> >>
> >> >>> >> - Use cases: I usually find this very useful in PRDs to better
> >> >>> >> communicate
> >> >>> >> the goals.
> >> >>> >>
> >> >>> >> - Internal architecture: This is usually *not* a thing users can
> >> >>> >> easily
> >> >>> >> comment on and it sounds more like a design doc item. Of course
> >> >>> >> it's
> >> >>> >> important to show that the SIP is feasible to implement. One
> >> >>> >> exception,
> >> >>> >> however, is that I think we'll have some SIPs primarily on
> >> >>> >> internals
> >> >>> >> (e.g.
> >> >>> >> if somebody wants to refactor Spark's query optimizer or
> >> >>> >> something).
> >> >>> >>
> >> >>> >> - Rejected strategies: I personally wouldn't put this, because
> >> >>> >> what's
> >> >>> >> the
> >> >>> >> point of voting to reject a strategy before you've really begun
> >> >>> >> designing
> >> >>> >> and implementing something? What if you discover that the
> strategy
> >> >>> >> is
> >> >>> >> actually better when you start doing stuff?
> >> >>> >>
> >> >>> >> At a super high level, it depends on whether you want the SIPs to
> >> >>> >> be
> >> >>> >> PRDs
> >> >>> >> for getting some quick feedback on the goals of a feature before
> it
> >> >>> >> is
> >> >>> >> designed, or something more like full-fledged design docs (just a
> >> >>> >> more
> >> >>> >> visible design doc for bigger changes). I looked at Kafka's KIPs,
> >> >>> >> and
> >> >>> >> they
> >> >>> >> actually seem to be more like design docs. This can work too but
> it
> >> >>> >> does
> >> >>> >> require more work from the proposer and it can lead to the same
> >> >>> >> problems you
> >> >>> >> mentioned with people already having a design and implementation
> in
> >> >>> >> mind.
> >> >>> >>
> >> >>> >> Basically, the question is, are you trying to iterate faster on
> >> >>> >> design
> >> >>> >> by
> >> >>> >> adding a step for user feedback earlier? Or are you just trying
> to
> >> >>> >> make
> >> >>> >> design docs for key features more visible (and their approval
> more
> >> >>> >> formal)?
> >> >>> >>
> >> >>> >> BTW note that in either case, I'd like to have a template for
> >> >>> >> design
> >> >>> >> docs
> >> >>> >> too, which should also include goals. I think that would've
> avoided
> >> >>> >> some of
> >> >>> >> the issues you brought up.
> >> >>> >>
> >> >>> >> Matei
> >> >>> >>
> >> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]>
> wrote:
> >> >>> >>
> >> >>> >> Here's my specific proposal (meta-proposal?)
> >> >>> >>
> >> >>> >> Spark Improvement Proposals (SIP)
> >> >>> >>
> >> >>> >>
> >> >>> >> Background:
> >> >>> >>
> >> >>> >> The current problem is that design and implementation of large
> >> >>> >> features
> >> >>> >> are
> >> >>> >> often done in private, before soliciting user feedback.
> >> >>> >>
> >> >>> >> When feedback is solicited, it is often as to detailed design
> >> >>> >> specifics, not
> >> >>> >> focused on goals.
> >> >>> >>
> >> >>> >> When implementation does take place after design, there is often
> >> >>> >> disagreement as to what goals are or are not in scope.
> >> >>> >>
> >> >>> >> This results in commits that don't fully meet user needs.
> >> >>> >>
> >> >>> >>
> >> >>> >> Goals:
> >> >>> >>
> >> >>> >> - Ensure user, contributor, and committer goals are clearly
> >> >>> >> identified
> >> >>> >> and
> >> >>> >> agreed upon, before implementation takes place.
> >> >>> >>
> >> >>> >> - Ensure that a technically feasible strategy is chosen that is
> >> >>> >> likely
> >> >>> >> to
> >> >>> >> meet the goals.
> >> >>> >>
> >> >>> >>
> >> >>> >> Rejected Goals:
> >> >>> >>
> >> >>> >> - SIPs are not for detailed design.  Design by committee doesn't
> >> >>> >> work.
> >> >>> >>
> >> >>> >> - SIPs are not for every change.  We dont need that much process.
> >> >>> >>
> >> >>> >>
> >> >>> >> Strategy:
> >> >>> >>
> >> >>> >> My suggestion is outlined as a Spark Improvement Proposal process
> >> >>> >> documented
> >> >>> >> at
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-
> improvement-proposals.md
> >> >>> >>
> >> >>> >> Specifics of Jira manipulation are an implementation detail we
> can
> >> >>> >> figure
> >> >>> >> out.
> >> >>> >>
> >> >>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
> >> >>> >>
> >> >>> >>
> >> >>> >> Rejected Strategies:
> >> >>> >>
> >> >>> >> Having someone who understands the problem implement it first
> >> >>> >> works,
> >> >>> >> but
> >> >>> >> only if significant iteration after user feedback is allowed.
> >> >>> >>
> >> >>> >> Historically this has been problematic due to pressure to limit
> >> >>> >> public
> >> >>> >> api
> >> >>> >> changes.
> >> >>> >>
> >> >>> >>
> >> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
> >> >>> >> wrote:
> >> >>> >>>
> >> >>> >>> Alright looks like there are quite a bit of support. We should
> >> >>> >>> wait
> >> >>> >>> to
> >> >>> >>> hear from more people too.
> >> >>> >>>
> >> >>> >>> To push this forward, Cody and I will be working together in the
> >> >>> >>> next
> >> >>> >>> couple of weeks to come up with a concrete, detailed proposal on
> >> >>> >>> what
> >> >>> >>> this
> >> >>> >>> entails, and then we can discuss this the specific proposal as
> >> >>> >>> well.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
> >> >>> >>> wrote:
> >> >>> >>>>
> >> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for
> major
> >> >>> >>>> user-facing or cross-cutting changes, not minor feature adds.
> >> >>> >>>>
> >> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
> >> >>> >>>> <[hidden email]> wrote:
> >> >>> >>>>>
> >> >>> >>>>> +1 to the SIP label as long as it does not slow down things
> and
> >> >>> >>>>> it
> >> >>> >>>>> targets optimizing efforts, coordination etc. For example
> really
> >> >>> >>>>> small
> >> >>> >>>>> features should not need to go through this process (assuming
> >> >>> >>>>> they
> >> >>> >>>>> dont
> >> >>> >>>>> touch public interfaces)  or re-factorings and hope it will be
> >> >>> >>>>> kept
> >> >>> >>>>> this
> >> >>> >>>>> way. So as a guideline doc should be provided, like in the KIP
> >> >>> >>>>> case.
> >> >>> >>>>>
> >> >>> >>>>> IMHO so far aside from tagging things and linking them
> elsewhere
> >> >>> >>>>> simply
> >> >>> >>>>> having design docs and prototypes implementations in PRs is
> not
> >> >>> >>>>> something
> >> >>> >>>>> that has not worked so far. What is really a pain in many
> >> >>> >>>>> projects
> >> >>> >>>>> out there
> >> >>> >>>>> is discontinuity in progress of PRs, missing features, slow
> >> >>> >>>>> reviews
> >> >>> >>>>> which is
> >> >>> >>>>> understandable to some extent... it is not only about Spark
> but
> >> >>> >>>>> things can
> >> >>> >>>>> be improved for sure for this project in particular as already
> >> >>> >>>>> stated.
> >> >>> >>>>>
> >> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden
> email]>
> >> >>> >>>>> wrote:
> >> >>> >>>>>>
> >> >>> >>>>>> +1 to adding an SIP label and linking it from the website.  I
> >> >>> >>>>>> think
> >> >>> >>>>>> it
> >> >>> >>>>>> needs
> >> >>> >>>>>>
> >> >>> >>>>>> - template that focuses it towards soliciting user goals /
> non
> >> >>> >>>>>> goals
> >> >>> >>>>>> - clear resolution as to which strategy was chosen to pursue.
> >> >>> >>>>>> I'd
> >> >>> >>>>>> recommend a vote.
> >> >>> >>>>>>
> >> >>> >>>>>> Matei asked me to clarify what I meant by changing
> interfaces,
> >> >>> >>>>>> I
> >> >>> >>>>>> think
> >> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here,
> >> >>> >>>>>> and
> >> >>> >>>>>> split
> >> >>> >>>>>> a thread for the other discussion per Nicholas' request.
> >> >>> >>>>>>
> >> >>> >>>>>> I meant changing public user interfaces.  I think the first
> >> >>> >>>>>> design
> >> >>> >>>>>> is
> >> >>> >>>>>> unlikely to be right, because it's done at a time when you
> have
> >> >>> >>>>>> the
> >> >>> >>>>>> least information.  As a user, I find it considerably more
> >> >>> >>>>>> frustrating
> >> >>> >>>>>> to be unable to use a tool to get my job done, than I do
> having
> >> >>> >>>>>> to
> >> >>> >>>>>> make minor changes to my code in order to take advantage of
> >> >>> >>>>>> features.
> >> >>> >>>>>> I've seen committers be seriously reluctant to allow changes
> to
> >> >>> >>>>>> @experimental code that are needed in order for it to really
> >> >>> >>>>>> work
> >> >>> >>>>>> right.  You need to be able to iterate, and if people on both
> >> >>> >>>>>> sides
> >> >>> >>>>>> of
> >> >>> >>>>>> the fence aren't going to respect that some newer apis are
> >> >>> >>>>>> subject
> >> >>> >>>>>> to
> >> >>> >>>>>> change, then why even mark them as such?
> >> >>> >>>>>>
> >> >>> >>>>>> Ideally a finished SIP should give me a checklist of things
> >> >>> >>>>>> that
> >> >>> >>>>>> an
> >> >>> >>>>>> implementation must do, and things that it doesn't need to
> do.
> >> >>> >>>>>> Contributors/committers should be seriously discouraged from
> >> >>> >>>>>> putting
> >> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype
> >> >>> >>>>>> implementation of all those things, especially if they're
> then
> >> >>> >>>>>> going
> >> >>> >>>>>> to argue against interface changes necessary to get the the
> >> >>> >>>>>> rest
> >> >>> >>>>>> of
> >> >>> >>>>>> the things done in the 0.2 version.
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
> >> >>> >>>>>> wrote:
> >> >>> >>>>>>> I like the lightweight proposal to add a SIP label.
> >> >>> >>>>>>>
> >> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested
> >> >>> >>>>>>> using
> >> >>> >>>>>>> wiki
> >> >>> >>>>>>> to
> >> >>> >>>>>>> track the list of major changes, but that never really
> >> >>> >>>>>>> materialized
> >> >>> >>>>>>> due to
> >> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then
> link
> >> >>> >>>>>>> to
> >> >>> >>>>>>> them
> >> >>> >>>>>>> prominently on the Spark website makes a lot of sense.
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
> >> >>> >>>>>>> <[hidden email]>
> >> >>> >>>>>>> wrote:
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> For the improvement proposals, I think one major point was
> to
> >> >>> >>>>>>>> make
> >> >>> >>>>>>>> them
> >> >>> >>>>>>>> really visible to users who are not contributors, so we
> >> >>> >>>>>>>> should
> >> >>> >>>>>>>> do
> >> >>> >>>>>>>> more than
> >> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to
> have a
> >> >>> >>>>>>>> new
> >> >>> >>>>>>>> type of
> >> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows
> all
> >> >>> >>>>>>>> such
> >> >>> >>>>>>>> JIRAs from
> >> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and
> >> >>> >>>>>>>> design
> >> >>> >>>>>>>> doc
> >> >>> >>>>>>>> templates (in fact many projects have them).
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> Matei
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
> >> >>> >>>>>>>> wrote:
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> I called Cody last night and talked about some of the
> topics
> >> >>> >>>>>>>> in
> >> >>> >>>>>>>> his
> >> >>> >>>>>>>> email.
> >> >>> >>>>>>>> It became clear to me Cody genuinely cares about the
> project.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> Some of the frustrations come from the success of the
> project
> >> >>> >>>>>>>> itself
> >> >>> >>>>>>>> becoming very "hot", and it is difficult to get clarity
> from
> >> >>> >>>>>>>> people
> >> >>> >>>>>>>> who
> >> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in
> >> >>> >>>>>>>> some
> >> >>> >>>>>>>> ways
> >> >>> >>>>>>>> similar
> >> >>> >>>>>>>> to scaling an engineering team in a successful startup: old
> >> >>> >>>>>>>> processes that
> >> >>> >>>>>>>> worked well might not work so well when it gets to a
> certain
> >> >>> >>>>>>>> size,
> >> >>> >>>>>>>> cultures
> >> >>> >>>>>>>> can get diluted, building culture vs building process, etc.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> I also really like to have a more visible process for
> larger
> >> >>> >>>>>>>> changes,
> >> >>> >>>>>>>> especially major user facing API changes. Historically we
> >> >>> >>>>>>>> upload
> >> >>> >>>>>>>> design docs
> >> >>> >>>>>>>> for major changes, but it is not always consistent and
> >> >>> >>>>>>>> difficult
> >> >>> >>>>>>>> to
> >> >>> >>>>>>>> quality
> >> >>> >>>>>>>> of the docs, due to the volunteering nature of the
> >> >>> >>>>>>>> organization.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on
> >> >>> >>>>>>>> building a
> >> >>> >>>>>>>> culture
> >> >>> >>>>>>>> to improve clarity:
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> - Process: Large changes should have design docs posted on
> >> >>> >>>>>>>> JIRA.
> >> >>> >>>>>>>> One
> >> >>> >>>>>>>> thing
> >> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me
> is
> >> >>> >>>>>>>> we
> >> >>> >>>>>>>> should
> >> >>> >>>>>>>> create a design doc template for the project and ask
> >> >>> >>>>>>>> everybody
> >> >>> >>>>>>>> to
> >> >>> >>>>>>>> follow.
> >> >>> >>>>>>>> The design doc template should also explicitly list goals
> and
> >> >>> >>>>>>>> non-goals, to
> >> >>> >>>>>>>> make design doc more consistent.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some
> this
> >> >>> >>>>>>>> with
> >> >>> >>>>>>>> some
> >> >>> >>>>>>>> changes, but again very inconsistent. Just posting
> something
> >> >>> >>>>>>>> on
> >> >>> >>>>>>>> JIRA
> >> >>> >>>>>>>> isn't
> >> >>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
> >> >>> >>>>>>>> signal
> >> >>> >>>>>>>> get lost
> >> >>> >>>>>>>> in the noise. While this is generally impossible to enforce
> >> >>> >>>>>>>> because
> >> >>> >>>>>>>> we can't
> >> >>> >>>>>>>> force all volunteers to conform to a process (or they might
> >> >>> >>>>>>>> not
> >> >>> >>>>>>>> even
> >> >>> >>>>>>>> be
> >> >>> >>>>>>>> aware of this),  those who are more familiar with the
> project
> >> >>> >>>>>>>> can
> >> >>> >>>>>>>> help by
> >> >>> >>>>>>>> emailing the dev@ when they see something that hasn't
> been.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> - Culture: The design doc author(s) should be open to
> >> >>> >>>>>>>> feedback.
> >> >>> >>>>>>>> A
> >> >>> >>>>>>>> design
> >> >>> >>>>>>>> doc should serve as the base for discussion and is by no
> >> >>> >>>>>>>> means
> >> >>> >>>>>>>> the
> >> >>> >>>>>>>> final
> >> >>> >>>>>>>> design. Of course, this does not mean the author has to
> >> >>> >>>>>>>> accept
> >> >>> >>>>>>>> every
> >> >>> >>>>>>>> feedback. They should also be comfortable accepting /
> >> >>> >>>>>>>> rejecting
> >> >>> >>>>>>>> ideas on
> >> >>> >>>>>>>> technical grounds.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be
> >> >>> >>>>>>>> useful
> >> >>> >>>>>>>> to
> >> >>> >>>>>>>> have
> >> >>> >>>>>>>> some monthly Google hangouts that are open to the world. I
> am
> >> >>> >>>>>>>> actually not
> >> >>> >>>>>>>> sure how well this will work, because of the volunteering
> >> >>> >>>>>>>> nature
> >> >>> >>>>>>>> and
> >> >>> >>>>>>>> we need
> >> >>> >>>>>>>> to adjust for timezones for people across the globe, but it
> >> >>> >>>>>>>> seems
> >> >>> >>>>>>>> worth
> >> >>> >>>>>>>> trying.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> - Culture: Contributors (including committers) should be
> more
> >> >>> >>>>>>>> direct
> >> >>> >>>>>>>> in
> >> >>> >>>>>>>> setting expectations, including whether they are working
> on a
> >> >>> >>>>>>>> specific
> >> >>> >>>>>>>> issue, whether they will be working on a specific issue,
> and
> >> >>> >>>>>>>> whether
> >> >>> >>>>>>>> an
> >> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I know
> in
> >> >>> >>>>>>>> this
> >> >>> >>>>>>>> community
> >> >>> >>>>>>>> are nice and don't enjoy telling other people no, but it is
> >> >>> >>>>>>>> often
> >> >>> >>>>>>>> more
> >> >>> >>>>>>>> annoying to a contributor to not know anything than
> getting a
> >> >>> >>>>>>>> no.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
> >> >>> >>>>>>>> <[hidden email]>
> >> >>> >>>>>>>> wrote:
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement
> Proposal"
> >> >>> >>>>>>>>> process that
> >> >>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I
> >> >>> >>>>>>>>> don't
> >> >>> >>>>>>>>> think
> >> >>> >>>>>>>>> committers are trying to minimize their own work -- every
> >> >>> >>>>>>>>> committer
> >> >>> >>>>>>>>> cares
> >> >>> >>>>>>>>> about making the software useful for users. However, it is
> >> >>> >>>>>>>>> always
> >> >>> >>>>>>>>> hard to
> >> >>> >>>>>>>>> get user input and so it helps to have this kind of
> process.
> >> >>> >>>>>>>>> I've
> >> >>> >>>>>>>>> certainly
> >> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just to
> see
> >> >>> >>>>>>>>> the
> >> >>> >>>>>>>>> biggest
> >> >>> >>>>>>>>> things on the roadmap.
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> When you're talking about "changing interfaces", are you
> >> >>> >>>>>>>>> talking
> >> >>> >>>>>>>>> about
> >> >>> >>>>>>>>> public or internal APIs? I do think many people hate
> >> >>> >>>>>>>>> changing
> >> >>> >>>>>>>>> public APIs
> >> >>> >>>>>>>>> and I actually think that's for the best of the project.
> >> >>> >>>>>>>>> That's
> >> >>> >>>>>>>>> a
> >> >>> >>>>>>>>> technical
> >> >>> >>>>>>>>> debate, but basically, the worst thing when you're using a
> >> >>> >>>>>>>>> piece
> >> >>> >>>>>>>>> of
> >> >>> >>>>>>>>> software
> >> >>> >>>>>>>>> is that the developers constantly ask you to rewrite your
> >> >>> >>>>>>>>> app
> >> >>> >>>>>>>>> to
> >> >>> >>>>>>>>> update to a
> >> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue
> >> >>> >>>>>>>>> anyone
> >> >>> >>>>>>>>> who's used
> >> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change
> their
> >> >>> >>>>>>>>> code
> >> >>> >>>>>>>>> this
> >> >>> >>>>>>>>> release" model works well within a single large company,
> but
> >> >>> >>>>>>>>> doesn't work
> >> >>> >>>>>>>>> well for a community, which is why nearly all *very*
> widely
> >> >>> >>>>>>>>> used
> >> >>> >>>>>>>>> programming
> >> >>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
> >> >>> >>>>>>>>> Windows
> >> >>> >>>>>>>>> API, etc)
> >> >>> >>>>>>>>> almost *never* break backwards compatibility. All this is
> >> >>> >>>>>>>>> done
> >> >>> >>>>>>>>> within reason
> >> >>> >>>>>>>>> though, e.g. we do change things in major releases (2.x,
> >> >>> >>>>>>>>> 3.x,
> >> >>> >>>>>>>>> etc).
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>> ------------------------------------------------------------
> ---------
> >> >>> >>>>>> To unsubscribe e-mail: [hidden email]
> >> >>> >>>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> --
> >> >>> >>>>> Stavros Kontopoulos
> >> >>> >>>>> Senior Software Engineer
> >> >>> >>>>> Lightbend, Inc.
> >> >>> >>>>> p:  +30 6977967274
> >> >>> >>>>> e: [hidden email]
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>
> >> >>> >>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe e-mail: [hidden email]
> >> >
> >> >
> >> > ________________________________
> >> >
> >> > If you reply to this email, your message will be added to the
> discussion
> >> > below:
> >> >
> >> >
> >> > http://apache-spark-developers-list.1001551.n3.
> nabble.com/Spark-Improvement-Proposals-tp19268p19359.html
> >> >
> >> > To start a new topic under Apache Spark Developers List, email [hidden
> >> > email]
> >> > To unsubscribe from Apache Spark Developers List, click here.
> >> > NAML
> >> >
> >> >
> >> > ________________________________
> >> > View this message in context: RE: Spark Improvement Proposals
> >> > Sent from the Apache Spark Developers List mailing list archive at
> >> > Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Spark Improvement Proposals

Reply via email to