Re: Spark Improvement Proposals

Ryan Blue Tue, 11 Oct 2016 13:58:02 -0700

I don't think we will have trouble with whatever rule that is adopted for
accepting proposals. Considering committers' votes binding (if that is what
we choose) is an established practice as long as it isn't for specific
votes, like a release vote. From the Apache docs: "Who is permitted to vote
is, to some extent, a community-specific thing." [1] And, I also don't see
why it would be a problem to choose consensus, as long as we have an open
discussion and vote about these rules.


rb

On Mon, Oct 10, 2016 at 4:15 PM, Cody Koeninger <c...@koeninger.org> wrote:

> If someone wants to tell me that it's OK and "The Apache Way" for
> Kafka and Flink to have a proposal process that ends in a lazy
> majority, but it's not OK for Spark to have a proposal process that
> ends in a non-lazy consensus...
>
> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Improvement+Proposals#KafkaImprovementProposals-Process
>
> In practice any PMC member can stop a proposal they don't like, so I'm
> not sure how much it matters.
>
>
>
> On Mon, Oct 10, 2016 at 5:59 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
> > There is a larger issue to keep in mind, and that is that what you are
> > proposing is a procedure that, as far as I am aware, hasn't previously
> been
> > adopted in an Apache project, and thus is not an easy or exact fit with
> > established practices that have been blessed as "The Apache Way".  As
> such,
> > we need to be careful, because we have run into some trouble in the past
> > with some inside the ASF but essentially outside the Spark community who
> > didn't like the way we were doing things.
> >
> > On Mon, Oct 10, 2016 at 3:53 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >>
> >> Apache documents say lots of confusing stuff, including that commiters
> are
> >> in practice given a vote.
> >>
> >> https://www.apache.org/foundation/voting.html
> >>
> >> I don't care either way, if someone wants me to sub commiter for PMC in
> >> the voting section, fine, we just need a clear outcome.
> >>
> >>
> >> On Oct 10, 2016 17:36, "Mark Hamstra" <m...@clearstorydata.com> wrote:
> >>>
> >>> If I'm correctly understanding the kind of voting that you are talking
> >>> about, then to be accurate, it is only the PMC members that have a
> vote, not
> >>> all committers:
> >>> https://www.apache.org/foundation/how-it-works.html#pmc-members
> >>>
> >>> On Mon, Oct 10, 2016 at 12:02 PM, Cody Koeninger <c...@koeninger.org>
> >>> wrote:
> >>>>
> >>>> I think the main value is in being honest about what's going on.  No
> >>>> one other than committers can cast a meaningful vote, that's the
> >>>> reality.  Beyond that, if people think it's more open to allow formal
> >>>> proposals from anyone, I'm not necessarily against it, but my main
> >>>> question would be this:
> >>>>
> >>>> If anyone can submit a proposal, are committers actually going to
> >>>> clearly reject and close proposals that don't meet the requirements?
> >>>>
> >>>> Right now we have a serious problem with lack of clarity regarding
> >>>> contributions, and that cannot spill over into goal-setting.
> >>>>
> >>>> On Mon, Oct 10, 2016 at 1:54 PM, Ryan Blue <rb...@netflix.com> wrote:
> >>>> > +1 to votes to approve proposals. I agree that proposals should have
> >>>> > an
> >>>> > official mechanism to be accepted, and a vote is an established
> means
> >>>> > of
> >>>> > doing that well. I like that it includes a period to review the
> >>>> > proposal and
> >>>> > I think proposals should have been discussed enough ahead of a vote
> to
> >>>> > survive the possibility of a veto.
> >>>> >
> >>>> > I also like the names that are short and (mostly) unique, like SEP.
> >>>> >
> >>>> > Where I disagree is with the requirement that a committer must
> >>>> > formally
> >>>> > propose an enhancement. I don't see the value of restricting this:
> if
> >>>> > someone has the will to write up a proposal then they should be
> >>>> > encouraged
> >>>> > to do so and start a discussion about it. Even if there is a
> political
> >>>> > reality as Cody says, what is the value of codifying that in our
> >>>> > process? I
> >>>> > think restricting who can submit proposals would only undermine them
> >>>> > by
> >>>> > pushing contributors out. Maybe I'm missing something here?
> >>>> >
> >>>> > rb
> >>>> >
> >>>> >
> >>>> >
> >>>> > On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org
> >
> >>>> > wrote:
> >>>> >>
> >>>> >> Yes, users suggesting SIPs is a good thing and is explicitly called
> >>>> >> out in the linked document under the Who? section.  Formally
> >>>> >> proposing
> >>>> >> them, not so much, because of the political realities.
> >>>> >>
> >>>> >> Yes, implementation strategy definitely affects goals.  There are
> all
> >>>> >> kinds of examples of this, I'll pick one that's my fault so as to
> >>>> >> avoid sounding like I'm blaming:
> >>>> >>
> >>>> >> When I implemented the Kafka DStream, one of my (not explicitly
> >>>> >> agreed
> >>>> >> upon by the community) goals was to make sure people could use the
> >>>> >> Dstream with however they were already using Kafka at work.  The
> lack
> >>>> >> of explicit agreement on that goal led to all kinds of fighting
> with
> >>>> >> committers, that could have been avoided.  The lack of explicit
> >>>> >> up-front strategy discussion led to the DStream not really working
> >>>> >> with compacted topics.  I knew about compacted topics, but don't
> have
> >>>> >> a use for them, so had a blind spot there.  If there was explicit
> >>>> >> up-front discussion that my strategy was "assume that batches can
> be
> >>>> >> defined on the driver solely by beginning and ending offsets",
> >>>> >> there's
> >>>> >> a greater chance that a user would have seen that and said, "hey,
> >>>> >> what
> >>>> >> about non-contiguous offsets in a compacted topic".
> >>>> >>
> >>>> >> This kind of thing is only going to happen smoothly if we have a
> >>>> >> lightweight user-visible process with clear outcomes.
> >>>> >>
> >>>> >> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
> >>>> >> <assaf.mendel...@rsa.com> wrote:
> >>>> >> > I agree with most of what Cody said.
> >>>> >> >
> >>>> >> > Two things:
> >>>> >> >
> >>>> >> > First we can always have other people suggest SIPs but mark them
> as
> >>>> >> > “unreviewed” and have committers basically move them forward. The
> >>>> >> > problem is
> >>>> >> > that writing a good document takes time. This way we can leverage
> >>>> >> > non
> >>>> >> > committers to do some of this work (it is just another way to
> >>>> >> > contribute).
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > As for strategy, in many cases implementation strategy can affect
> >>>> >> > the
> >>>> >> > goals.
> >>>> >> > I will give  a small example: In the current structured streaming
> >>>> >> > strategy,
> >>>> >> > we group by the time to achieve a sliding window. This is
> >>>> >> > definitely an
> >>>> >> > implementation decision and not a goal. However, I can think of
> >>>> >> > several
> >>>> >> > aggregation functions which have the time inside their
> calculation
> >>>> >> > buffer.
> >>>> >> > For example, let’s say we want to return a set of all distinct
> >>>> >> > values.
> >>>> >> > One
> >>>> >> > way to implement this would be to make the set into a map and
> have
> >>>> >> > the
> >>>> >> > value
> >>>> >> > contain the last time seen. Multiplying it across the groupby
> would
> >>>> >> > cost
> >>>> >> > a
> >>>> >> > lot in performance. So adding such a strategy would have a great
> >>>> >> > effect
> >>>> >> > on
> >>>> >> > the type of aggregations and their performance which does affect
> >>>> >> > the
> >>>> >> > goal.
> >>>> >> > Without adding the strategy, it is easy for whoever goes to the
> >>>> >> > design
> >>>> >> > document to not think about these cases. Furthermore, it might be
> >>>> >> > decided
> >>>> >> > that these cases are rare enough so that the strategy is still
> good
> >>>> >> > enough
> >>>> >> > but how would we know it without user feedback?
> >>>> >> >
> >>>> >> > I believe this example is exactly what Cody was talking about.
> >>>> >> > Since
> >>>> >> > many
> >>>> >> > times implementation strategies have a large effect on the goal,
> we
> >>>> >> > should
> >>>> >> > have it discussed when discussing the goals. In addition, while
> it
> >>>> >> > is
> >>>> >> > often
> >>>> >> > easy to throw out completely infeasible goals, it is often much
> >>>> >> > harder
> >>>> >> > to
> >>>> >> > figure out that the goals are unfeasible without fine tuning.
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > Assaf.
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > From: Cody Koeninger-2 [via Apache Spark Developers List]
> >>>> >> > [mailto:ml-node+[hidden email]]
> >>>> >> > Sent: Monday, October 10, 2016 2:25 AM
> >>>> >> > To: Mendelson, Assaf
> >>>> >> > Subject: Re: Spark Improvement Proposals
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > Only committers should formally submit SIPs because in an apache
> >>>> >> > project only commiters have explicit political power.  If a user
> >>>> >> > can't
> >>>> >> > find a commiter willing to sponsor an SIP idea, they have no way
> to
> >>>> >> > get the idea passed in any case.  If I can't find a committer to
> >>>> >> > sponsor this meta-SIP idea, I'm out of luck.
> >>>> >> >
> >>>> >> > I do not believe unrealistic goals can be found solely by
> >>>> >> > inspection.
> >>>> >> > We've managed to ignore unrealistic goals even after
> >>>> >> > implementation!
> >>>> >> > Focusing on APIs can allow people to think they've solved
> >>>> >> > something,
> >>>> >> > when there's really no way of implementing that API while meeting
> >>>> >> > the
> >>>> >> > goals.  Rapid iteration is clearly the best way to address this,
> >>>> >> > but
> >>>> >> > we've already talked about why that hasn't really worked.  If
> >>>> >> > adding a
> >>>> >> > non-binding API section to the template is important to you, I'm
> >>>> >> > not
> >>>> >> > against it, but I don't think it's sufficient.
> >>>> >> >
> >>>> >> > On your PRD vs design doc spectrum, I'm saying this is closer to
> a
> >>>> >> > PRD.  Clear agreement on goals is the most important thing and
> >>>> >> > that's
> >>>> >> > why it's the thing I want binding agreement on.  But I cannot
> agree
> >>>> >> > to
> >>>> >> > goals unless I have enough minimal technical info to judge
> whether
> >>>> >> > the
> >>>> >> > goals are likely to actually be accomplished.
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]>
> >>>> >> > wrote:
> >>>> >> >
> >>>> >> >
> >>>> >> >> Well, I think there are a few things here that don't make sense.
> >>>> >> >> First,
> >>>> >> >> why
> >>>> >> >> should only committers submit SIPs? Development in the project
> >>>> >> >> should
> >>>> >> >> be
> >>>> >> >> open to all contributors, whether they're committers or not.
> >>>> >> >> Second, I
> >>>> >> >> think
> >>>> >> >> unrealistic goals can be found just by inspecting the goals, and
> >>>> >> >> I'm
> >>>> >> >> not
> >>>> >> >> super worried that we'll accept a lot of SIPs that are then
> >>>> >> >> infeasible
> >>>> >> >> --
> >>>> >> >> we
> >>>> >> >> can then submit new ones. But this depends on whether you want
> >>>> >> >> this
> >>>> >> >> process
> >>>> >> >> to be a "design doc lite", where people also agree on
> >>>> >> >> implementation
> >>>> >> >> strategy, or just a way to agree on goals. This is what I asked
> >>>> >> >> earlier
> >>>> >> >> about PRDs vs design docs (and I'm open to either one but I'd
> just
> >>>> >> >> like
> >>>> >> >> clarity). Finally, both as a user and designer of software, I
> >>>> >> >> always
> >>>> >> >> want
> >>>> >> >> to
> >>>> >> >> give feedback on APIs, so I'd really like a culture of having
> >>>> >> >> those
> >>>> >> >> early.
> >>>> >> >> People don't argue about prettiness when they discuss APIs, they
> >>>> >> >> argue
> >>>> >> >> about
> >>>> >> >> the core concepts to expose in order to meet various goals, and
> >>>> >> >> then
> >>>> >> >> they're
> >>>> >> >> stuck maintaining those for a long time.
> >>>> >> >>
> >>>> >> >> Matei
> >>>> >> >>
> >>>> >> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]>
> wrote:
> >>>> >> >>
> >>>> >> >> Users instead of people, sure.  Commiters and contributors are
> (or
> >>>> >> >> at
> >>>> >> >> least
> >>>> >> >> should be) a subset of users.
> >>>> >> >>
> >>>> >> >> Non goals, sure. I don't care what the name is, but we need to
> >>>> >> >> clearly
> >>>> >> >> say
> >>>> >> >> e.g. 'no we are not maintaining compatibility with XYZ right
> now'.
> >>>> >> >>
> >>>> >> >> API, what I care most about is whether it allows me to
> accomplish
> >>>> >> >> the
> >>>> >> >> goals.
> >>>> >> >> Arguing about how ugly or pretty it is can be saved for design/
> >>>> >> >> implementation imho.
> >>>> >> >>
> >>>> >> >> Strategy, this is necessary because otherwise goals can be out
> of
> >>>> >> >> line
> >>>> >> >> with
> >>>> >> >> reality.  Don't propose goals you don't have at least some idea
> of
> >>>> >> >> how
> >>>> >> >> to
> >>>> >> >> implement.
> >>>> >> >>
> >>>> >> >> Rejected strategies, given that commiters are the only ones I'm
> >>>> >> >> saying
> >>>> >> >> should formally submit SPARKLIs or SIPs, if they put junk in a
> >>>> >> >> required
> >>>> >> >> section then slap them down for it and tell them to fix it.
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
> >>>> >> >>>
> >>>> >> >>> Yup, this is the stuff that I found unclear. Thanks for
> >>>> >> >>> clarifying
> >>>> >> >>> here,
> >>>> >> >>> but we should also clarify it in the writeup. In particular:
> >>>> >> >>>
> >>>> >> >>> - Goals needs to be about user-facing behavior ("people" is
> >>>> >> >>> broad)
> >>>> >> >>>
> >>>> >> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone
> will
> >>>> >> >>> dig
> >>>> >> >>> up
> >>>> >> >>> one of these and say "Spark's developers have officially
> rejected
> >>>> >> >>> X,
> >>>> >> >>> which
> >>>> >> >>> our awesome system has".
> >>>> >> >>>
> >>>> >> >>> - For user-facing stuff, I think you need a section on API.
> >>>> >> >>> Virtually
> >>>> >> >>> all
> >>>> >> >>> other *IPs I've seen have that.
> >>>> >> >>>
> >>>> >> >>> - I'm still not sure why the strategy section is needed if the
> >>>> >> >>> purpose
> >>>> >> >>> is
> >>>> >> >>> to define user-facing behavior -- unless this is the strategy
> for
> >>>> >> >>> setting
> >>>> >> >>> the goals or for defining the API. That sounds squarely like a
> >>>> >> >>> design
> >>>> >> >>> doc
> >>>> >> >>> issue. In some sense, who cares whether the proposal is
> >>>> >> >>> technically
> >>>> >> >>> feasible
> >>>> >> >>> right now? If it's infeasible, that will be discovered later
> >>>> >> >>> during
> >>>> >> >>> design
> >>>> >> >>> and implementation. Same thing with rejected strategies --
> >>>> >> >>> listing
> >>>> >> >>> some
> >>>> >> >>> of
> >>>> >> >>> those is definitely useful sometimes, but if you make this a
> >>>> >> >>> *required*
> >>>> >> >>> section, people are just going to fill it in with bogus stuff
> >>>> >> >>> (I've
> >>>> >> >>> seen
> >>>> >> >>> this happen before).
> >>>> >> >>>
> >>>> >> >>> Matei
> >>>> >> >>>
> >>>> >> >
> >>>> >> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]>
> >>>> >> >>> > wrote:
> >>>> >> >>> >
> >>>> >> >>> > So to focus the discussion on the specific strategy I'm
> >>>> >> >>> > suggesting,
> >>>> >> >>> > documented at
> >>>> >> >>> >
> >>>> >> >>> >
> >>>> >> >>> >
> >>>> >> >>> >
> >>>> >> >>> >
> >>>> >> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-
> improvement-proposals.md
> >>>> >> >>> >
> >>>> >> >>> > "Goals: What must this allow people to do, that they can't
> >>>> >> >>> > currently?"
> >>>> >> >>> >
> >>>> >> >>> > Is it unclear that this is focusing specifically on
> >>>> >> >>> > people-visible
> >>>> >> >>> > behavior?
> >>>> >> >>> >
> >>>> >> >>> > Rejected goals -  are important because otherwise people keep
> >>>> >> >>> > trying
> >>>> >> >>> > to argue about scope.  Of course you can change things later
> >>>> >> >>> > with a
> >>>> >> >>> > different SIP and different vote, the point is to focus.
> >>>> >> >>> >
> >>>> >> >>> > Use cases - are something that people are going to bring up
> in
> >>>> >> >>> > discussion.  If they aren't clearly documented as a goal
> ("This
> >>>> >> >>> > must
> >>>> >> >>> > allow me to connect using SSL"), they should be added.
> >>>> >> >>> >
> >>>> >> >>> > Internal architecture - if the people who need specific
> >>>> >> >>> > behavior are
> >>>> >> >>> > implementers of other parts of the system, that's fine.
> >>>> >> >>> >
> >>>> >> >>> > Rejected strategies - If you have none of these, you have no
> >>>> >> >>> > evidence
> >>>> >> >>> > that the proponent didn't just go with the first thing they
> had
> >>>> >> >>> > in
> >>>> >> >>> > mind (or have already implemented), which is a big problem
> >>>> >> >>> > currently.
> >>>> >> >>> > Approval isn't binding as to specifics of implementation, so
> >>>> >> >>> > these
> >>>> >> >>> > aren't handcuffs.  The goals are the contract, the strategy
> is
> >>>> >> >>> > evidence that contract can actually be met.
> >>>> >> >>> >
> >>>> >> >>> > Design docs - I'm not touching design docs.  The markdown
> file
> >>>> >> >>> > I
> >>>> >> >>> > linked specifically says of the strategy section "This is
> not a
> >>>> >> >>> > full
> >>>> >> >>> > design document."  Is this unclear?  Design docs can be
> worked
> >>>> >> >>> > on
> >>>> >> >>> > obviously, but that's not what I'm concerned with here.
> >>>> >> >>> >
> >>>> >> >>> >
> >>>> >> >>> >
> >>>> >> >>> >
> >>>> >> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden
> email]>
> >>>> >> >>> > wrote:
> >>>> >> >>> >> Hi Cody,
> >>>> >> >>> >>
> >>>> >> >>> >> I think this would be a lot more concrete if we had a more
> >>>> >> >>> >> detailed
> >>>> >> >>> >> template
> >>>> >> >>> >> for SIPs. Right now, it's not super clear what's in scope --
> >>>> >> >>> >> e.g.
> >>>> >> >>> >> are
> >>>> >> >>> >> they
> >>>> >> >>> >> a way to solicit feedback on the user-facing behavior or on
> >>>> >> >>> >> the
> >>>> >> >>> >> internals?
> >>>> >> >>> >> "Goals" can cover both things. I've been thinking of SIPs
> more
> >>>> >> >>> >> as
> >>>> >> >>> >> Product
> >>>> >> >>> >> Requirements Docs (PRDs), which focus on *what* a code
> change
> >>>> >> >>> >> should
> >>>> >> >>> >> do
> >>>> >> >>> >> as
> >>>> >> >>> >> opposed to how.
> >>>> >> >>> >>
> >>>> >> >>> >> In particular, here are some things that you may or may not
> >>>> >> >>> >> consider
> >>>> >> >>> >> in
> >>>> >> >>> >> scope for SIPs:
> >>>> >> >>> >>
> >>>> >> >>> >> - Goals and non-goals: This is definitely in scope, and IMO
> >>>> >> >>> >> should
> >>>> >> >>> >> focus on
> >>>> >> >>> >> user-visible behavior (e.g. "system supports SQL window
> >>>> >> >>> >> functions"
> >>>> >> >>> >> or
> >>>> >> >>> >> "system continues working if one node fails"). BTW I
> wouldn't
> >>>> >> >>> >> say
> >>>> >> >>> >> "rejected
> >>>> >> >>> >> goals" because some of them might become goals later, so
> we're
> >>>> >> >>> >> not
> >>>> >> >>> >> definitively rejecting them.
> >>>> >> >>> >>
> >>>> >> >>> >> - Public API: Probably should be included in most SIPs
> unless
> >>>> >> >>> >> it's
> >>>> >> >>> >> too
> >>>> >> >>> >> large
> >>>> >> >>> >> to fully specify then (e.g. "let's add an ML library").
> >>>> >> >>> >>
> >>>> >> >>> >> - Use cases: I usually find this very useful in PRDs to
> better
> >>>> >> >>> >> communicate
> >>>> >> >>> >> the goals.
> >>>> >> >>> >>
> >>>> >> >>> >> - Internal architecture: This is usually *not* a thing users
> >>>> >> >>> >> can
> >>>> >> >>> >> easily
> >>>> >> >>> >> comment on and it sounds more like a design doc item. Of
> >>>> >> >>> >> course
> >>>> >> >>> >> it's
> >>>> >> >>> >> important to show that the SIP is feasible to implement. One
> >>>> >> >>> >> exception,
> >>>> >> >>> >> however, is that I think we'll have some SIPs primarily on
> >>>> >> >>> >> internals
> >>>> >> >>> >> (e.g.
> >>>> >> >>> >> if somebody wants to refactor Spark's query optimizer or
> >>>> >> >>> >> something).
> >>>> >> >>> >>
> >>>> >> >>> >> - Rejected strategies: I personally wouldn't put this,
> because
> >>>> >> >>> >> what's
> >>>> >> >>> >> the
> >>>> >> >>> >> point of voting to reject a strategy before you've really
> >>>> >> >>> >> begun
> >>>> >> >>> >> designing
> >>>> >> >>> >> and implementing something? What if you discover that the
> >>>> >> >>> >> strategy
> >>>> >> >>> >> is
> >>>> >> >>> >> actually better when you start doing stuff?
> >>>> >> >>> >>
> >>>> >> >>> >> At a super high level, it depends on whether you want the
> SIPs
> >>>> >> >>> >> to
> >>>> >> >>> >> be
> >>>> >> >>> >> PRDs
> >>>> >> >>> >> for getting some quick feedback on the goals of a feature
> >>>> >> >>> >> before it
> >>>> >> >>> >> is
> >>>> >> >>> >> designed, or something more like full-fledged design docs
> >>>> >> >>> >> (just a
> >>>> >> >>> >> more
> >>>> >> >>> >> visible design doc for bigger changes). I looked at Kafka's
> >>>> >> >>> >> KIPs,
> >>>> >> >>> >> and
> >>>> >> >>> >> they
> >>>> >> >>> >> actually seem to be more like design docs. This can work too
> >>>> >> >>> >> but it
> >>>> >> >>> >> does
> >>>> >> >>> >> require more work from the proposer and it can lead to the
> >>>> >> >>> >> same
> >>>> >> >>> >> problems you
> >>>> >> >>> >> mentioned with people already having a design and
> >>>> >> >>> >> implementation in
> >>>> >> >>> >> mind.
> >>>> >> >>> >>
> >>>> >> >>> >> Basically, the question is, are you trying to iterate faster
> >>>> >> >>> >> on
> >>>> >> >>> >> design
> >>>> >> >>> >> by
> >>>> >> >>> >> adding a step for user feedback earlier? Or are you just
> >>>> >> >>> >> trying to
> >>>> >> >>> >> make
> >>>> >> >>> >> design docs for key features more visible (and their
> approval
> >>>> >> >>> >> more
> >>>> >> >>> >> formal)?
> >>>> >> >>> >>
> >>>> >> >>> >> BTW note that in either case, I'd like to have a template
> for
> >>>> >> >>> >> design
> >>>> >> >>> >> docs
> >>>> >> >>> >> too, which should also include goals. I think that would've
> >>>> >> >>> >> avoided
> >>>> >> >>> >> some of
> >>>> >> >>> >> the issues you brought up.
> >>>> >> >>> >>
> >>>> >> >>> >> Matei
> >>>> >> >>> >>
> >>>> >> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]>
> >>>> >> >>> >> wrote:
> >>>> >> >>> >>
> >>>> >> >>> >> Here's my specific proposal (meta-proposal?)
> >>>> >> >>> >>
> >>>> >> >>> >> Spark Improvement Proposals (SIP)
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >> Background:
> >>>> >> >>> >>
> >>>> >> >>> >> The current problem is that design and implementation of
> large
> >>>> >> >>> >> features
> >>>> >> >>> >> are
> >>>> >> >>> >> often done in private, before soliciting user feedback.
> >>>> >> >>> >>
> >>>> >> >>> >> When feedback is solicited, it is often as to detailed
> design
> >>>> >> >>> >> specifics, not
> >>>> >> >>> >> focused on goals.
> >>>> >> >>> >>
> >>>> >> >>> >> When implementation does take place after design, there is
> >>>> >> >>> >> often
> >>>> >> >>> >> disagreement as to what goals are or are not in scope.
> >>>> >> >>> >>
> >>>> >> >>> >> This results in commits that don't fully meet user needs.
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >> Goals:
> >>>> >> >>> >>
> >>>> >> >>> >> - Ensure user, contributor, and committer goals are clearly
> >>>> >> >>> >> identified
> >>>> >> >>> >> and
> >>>> >> >>> >> agreed upon, before implementation takes place.
> >>>> >> >>> >>
> >>>> >> >>> >> - Ensure that a technically feasible strategy is chosen that
> >>>> >> >>> >> is
> >>>> >> >>> >> likely
> >>>> >> >>> >> to
> >>>> >> >>> >> meet the goals.
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >> Rejected Goals:
> >>>> >> >>> >>
> >>>> >> >>> >> - SIPs are not for detailed design.  Design by committee
> >>>> >> >>> >> doesn't
> >>>> >> >>> >> work.
> >>>> >> >>> >>
> >>>> >> >>> >> - SIPs are not for every change.  We dont need that much
> >>>> >> >>> >> process.
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >> Strategy:
> >>>> >> >>> >>
> >>>> >> >>> >> My suggestion is outlined as a Spark Improvement Proposal
> >>>> >> >>> >> process
> >>>> >> >>> >> documented
> >>>> >> >>> >> at
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-
> improvement-proposals.md
> >>>> >> >>> >>
> >>>> >> >>> >> Specifics of Jira manipulation are an implementation detail
> we
> >>>> >> >>> >> can
> >>>> >> >>> >> figure
> >>>> >> >>> >> out.
> >>>> >> >>> >>
> >>>> >> >>> >> I'm suggesting voting; the need here is for a _clear_
> outcome.
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >> Rejected Strategies:
> >>>> >> >>> >>
> >>>> >> >>> >> Having someone who understands the problem implement it
> first
> >>>> >> >>> >> works,
> >>>> >> >>> >> but
> >>>> >> >>> >> only if significant iteration after user feedback is
> allowed.
> >>>> >> >>> >>
> >>>> >> >>> >> Historically this has been problematic due to pressure to
> >>>> >> >>> >> limit
> >>>> >> >>> >> public
> >>>> >> >>> >> api
> >>>> >> >>> >> changes.
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
> >>>> >> >>> >> wrote:
> >>>> >> >>> >>>
> >>>> >> >>> >>> Alright looks like there are quite a bit of support. We
> >>>> >> >>> >>> should
> >>>> >> >>> >>> wait
> >>>> >> >>> >>> to
> >>>> >> >>> >>> hear from more people too.
> >>>> >> >>> >>>
> >>>> >> >>> >>> To push this forward, Cody and I will be working together
> in
> >>>> >> >>> >>> the
> >>>> >> >>> >>> next
> >>>> >> >>> >>> couple of weeks to come up with a concrete, detailed
> proposal
> >>>> >> >>> >>> on
> >>>> >> >>> >>> what
> >>>> >> >>> >>> this
> >>>> >> >>> >>> entails, and then we can discuss this the specific proposal
> >>>> >> >>> >>> as
> >>>> >> >>> >>> well.
> >>>> >> >>> >>>
> >>>> >> >>> >>>
> >>>> >> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden
> >>>> >> >>> >>> email]>
> >>>> >> >>> >>> wrote:
> >>>> >> >>> >>>>
> >>>> >> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs
> for
> >>>> >> >>> >>>> major
> >>>> >> >>> >>>> user-facing or cross-cutting changes, not minor feature
> >>>> >> >>> >>>> adds.
> >>>> >> >>> >>>>
> >>>> >> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
> >>>> >> >>> >>>> <[hidden email]> wrote:
> >>>> >> >>> >>>>>
> >>>> >> >>> >>>>> +1 to the SIP label as long as it does not slow down
> things
> >>>> >> >>> >>>>> and
> >>>> >> >>> >>>>> it
> >>>> >> >>> >>>>> targets optimizing efforts, coordination etc. For example
> >>>> >> >>> >>>>> really
> >>>> >> >>> >>>>> small
> >>>> >> >>> >>>>> features should not need to go through this process
> >>>> >> >>> >>>>> (assuming
> >>>> >> >>> >>>>> they
> >>>> >> >>> >>>>> dont
> >>>> >> >>> >>>>> touch public interfaces)  or re-factorings and hope it
> will
> >>>> >> >>> >>>>> be
> >>>> >> >>> >>>>> kept
> >>>> >> >>> >>>>> this
> >>>> >> >>> >>>>> way. So as a guideline doc should be provided, like in
> the
> >>>> >> >>> >>>>> KIP
> >>>> >> >>> >>>>> case.
> >>>> >> >>> >>>>>
> >>>> >> >>> >>>>> IMHO so far aside from tagging things and linking them
> >>>> >> >>> >>>>> elsewhere
> >>>> >> >>> >>>>> simply
> >>>> >> >>> >>>>> having design docs and prototypes implementations in PRs
> is
> >>>> >> >>> >>>>> not
> >>>> >> >>> >>>>> something
> >>>> >> >>> >>>>> that has not worked so far. What is really a pain in many
> >>>> >> >>> >>>>> projects
> >>>> >> >>> >>>>> out there
> >>>> >> >>> >>>>> is discontinuity in progress of PRs, missing features,
> slow
> >>>> >> >>> >>>>> reviews
> >>>> >> >>> >>>>> which is
> >>>> >> >>> >>>>> understandable to some extent... it is not only about
> Spark
> >>>> >> >>> >>>>> but
> >>>> >> >>> >>>>> things can
> >>>> >> >>> >>>>> be improved for sure for this project in particular as
> >>>> >> >>> >>>>> already
> >>>> >> >>> >>>>> stated.
> >>>> >> >>> >>>>>
> >>>> >> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden
> >>>> >> >>> >>>>> email]>
> >>>> >> >>> >>>>> wrote:
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>> +1 to adding an SIP label and linking it from the
> website.
> >>>> >> >>> >>>>>> I
> >>>> >> >>> >>>>>> think
> >>>> >> >>> >>>>>> it
> >>>> >> >>> >>>>>> needs
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>> - template that focuses it towards soliciting user
> goals /
> >>>> >> >>> >>>>>> non
> >>>> >> >>> >>>>>> goals
> >>>> >> >>> >>>>>> - clear resolution as to which strategy was chosen to
> >>>> >> >>> >>>>>> pursue.
> >>>> >> >>> >>>>>> I'd
> >>>> >> >>> >>>>>> recommend a vote.
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>> Matei asked me to clarify what I meant by changing
> >>>> >> >>> >>>>>> interfaces,
> >>>> >> >>> >>>>>> I
> >>>> >> >>> >>>>>> think
> >>>> >> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify
> >>>> >> >>> >>>>>> here,
> >>>> >> >>> >>>>>> and
> >>>> >> >>> >>>>>> split
> >>>> >> >>> >>>>>> a thread for the other discussion per Nicholas' request.
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>> I meant changing public user interfaces.  I think the
> >>>> >> >>> >>>>>> first
> >>>> >> >>> >>>>>> design
> >>>> >> >>> >>>>>> is
> >>>> >> >>> >>>>>> unlikely to be right, because it's done at a time when
> you
> >>>> >> >>> >>>>>> have
> >>>> >> >>> >>>>>> the
> >>>> >> >>> >>>>>> least information.  As a user, I find it considerably
> more
> >>>> >> >>> >>>>>> frustrating
> >>>> >> >>> >>>>>> to be unable to use a tool to get my job done, than I do
> >>>> >> >>> >>>>>> having
> >>>> >> >>> >>>>>> to
> >>>> >> >>> >>>>>> make minor changes to my code in order to take advantage
> >>>> >> >>> >>>>>> of
> >>>> >> >>> >>>>>> features.
> >>>> >> >>> >>>>>> I've seen committers be seriously reluctant to allow
> >>>> >> >>> >>>>>> changes to
> >>>> >> >>> >>>>>> @experimental code that are needed in order for it to
> >>>> >> >>> >>>>>> really
> >>>> >> >>> >>>>>> work
> >>>> >> >>> >>>>>> right.  You need to be able to iterate, and if people on
> >>>> >> >>> >>>>>> both
> >>>> >> >>> >>>>>> sides
> >>>> >> >>> >>>>>> of
> >>>> >> >>> >>>>>> the fence aren't going to respect that some newer apis
> are
> >>>> >> >>> >>>>>> subject
> >>>> >> >>> >>>>>> to
> >>>> >> >>> >>>>>> change, then why even mark them as such?
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>> Ideally a finished SIP should give me a checklist of
> >>>> >> >>> >>>>>> things
> >>>> >> >>> >>>>>> that
> >>>> >> >>> >>>>>> an
> >>>> >> >>> >>>>>> implementation must do, and things that it doesn't need
> to
> >>>> >> >>> >>>>>> do.
> >>>> >> >>> >>>>>> Contributors/committers should be seriously discouraged
> >>>> >> >>> >>>>>> from
> >>>> >> >>> >>>>>> putting
> >>>> >> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype
> >>>> >> >>> >>>>>> implementation of all those things, especially if
> they're
> >>>> >> >>> >>>>>> then
> >>>> >> >>> >>>>>> going
> >>>> >> >>> >>>>>> to argue against interface changes necessary to get the
> >>>> >> >>> >>>>>> the
> >>>> >> >>> >>>>>> rest
> >>>> >> >>> >>>>>> of
> >>>> >> >>> >>>>>> the things done in the 0.2 version.
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden
> >>>> >> >>> >>>>>> email]>
> >>>> >> >>> >>>>>> wrote:
> >>>> >> >>> >>>>>>> I like the lightweight proposal to add a SIP label.
> >>>> >> >>> >>>>>>>
> >>>> >> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I
> >>>> >> >>> >>>>>>> suggested
> >>>> >> >>> >>>>>>> using
> >>>> >> >>> >>>>>>> wiki
> >>>> >> >>> >>>>>>> to
> >>>> >> >>> >>>>>>> track the list of major changes, but that never really
> >>>> >> >>> >>>>>>> materialized
> >>>> >> >>> >>>>>>> due to
> >>>> >> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and
> then
> >>>> >> >>> >>>>>>> link
> >>>> >> >>> >>>>>>> to
> >>>> >> >>> >>>>>>> them
> >>>> >> >>> >>>>>>> prominently on the Spark website makes a lot of sense.
> >>>> >> >>> >>>>>>>
> >>>> >> >>> >>>>>>>
> >>>> >> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
> >>>> >> >>> >>>>>>> <[hidden email]>
> >>>> >> >>> >>>>>>> wrote:
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> For the improvement proposals, I think one major point
> >>>> >> >>> >>>>>>>> was to
> >>>> >> >>> >>>>>>>> make
> >>>> >> >>> >>>>>>>> them
> >>>> >> >>> >>>>>>>> really visible to users who are not contributors, so
> we
> >>>> >> >>> >>>>>>>> should
> >>>> >> >>> >>>>>>>> do
> >>>> >> >>> >>>>>>>> more than
> >>>> >> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is
> to
> >>>> >> >>> >>>>>>>> have a
> >>>> >> >>> >>>>>>>> new
> >>>> >> >>> >>>>>>>> type of
> >>>> >> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that
> shows
> >>>> >> >>> >>>>>>>> all
> >>>> >> >>> >>>>>>>> such
> >>>> >> >>> >>>>>>>> JIRAs from
> >>>> >> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP
> and
> >>>> >> >>> >>>>>>>> design
> >>>> >> >>> >>>>>>>> doc
> >>>> >> >>> >>>>>>>> templates (in fact many projects have them).
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> Matei
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden
> >>>> >> >>> >>>>>>>> email]>
> >>>> >> >>> >>>>>>>> wrote:
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> I called Cody last night and talked about some of the
> >>>> >> >>> >>>>>>>> topics
> >>>> >> >>> >>>>>>>> in
> >>>> >> >>> >>>>>>>> his
> >>>> >> >>> >>>>>>>> email.
> >>>> >> >>> >>>>>>>> It became clear to me Cody genuinely cares about the
> >>>> >> >>> >>>>>>>> project.
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> Some of the frustrations come from the success of the
> >>>> >> >>> >>>>>>>> project
> >>>> >> >>> >>>>>>>> itself
> >>>> >> >>> >>>>>>>> becoming very "hot", and it is difficult to get
> clarity
> >>>> >> >>> >>>>>>>> from
> >>>> >> >>> >>>>>>>> people
> >>>> >> >>> >>>>>>>> who
> >>>> >> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is
> >>>> >> >>> >>>>>>>> in
> >>>> >> >>> >>>>>>>> some
> >>>> >> >>> >>>>>>>> ways
> >>>> >> >>> >>>>>>>> similar
> >>>> >> >>> >>>>>>>> to scaling an engineering team in a successful
> startup:
> >>>> >> >>> >>>>>>>> old
> >>>> >> >>> >>>>>>>> processes that
> >>>> >> >>> >>>>>>>> worked well might not work so well when it gets to a
> >>>> >> >>> >>>>>>>> certain
> >>>> >> >>> >>>>>>>> size,
> >>>> >> >>> >>>>>>>> cultures
> >>>> >> >>> >>>>>>>> can get diluted, building culture vs building process,
> >>>> >> >>> >>>>>>>> etc.
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> I also really like to have a more visible process for
> >>>> >> >>> >>>>>>>> larger
> >>>> >> >>> >>>>>>>> changes,
> >>>> >> >>> >>>>>>>> especially major user facing API changes. Historically
> >>>> >> >>> >>>>>>>> we
> >>>> >> >>> >>>>>>>> upload
> >>>> >> >>> >>>>>>>> design docs
> >>>> >> >>> >>>>>>>> for major changes, but it is not always consistent and
> >>>> >> >>> >>>>>>>> difficult
> >>>> >> >>> >>>>>>>> to
> >>>> >> >>> >>>>>>>> quality
> >>>> >> >>> >>>>>>>> of the docs, due to the volunteering nature of the
> >>>> >> >>> >>>>>>>> organization.
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on
> >>>> >> >>> >>>>>>>> building a
> >>>> >> >>> >>>>>>>> culture
> >>>> >> >>> >>>>>>>> to improve clarity:
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> - Process: Large changes should have design docs
> posted
> >>>> >> >>> >>>>>>>> on
> >>>> >> >>> >>>>>>>> JIRA.
> >>>> >> >>> >>>>>>>> One
> >>>> >> >>> >>>>>>>> thing
> >>>> >> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came
> to
> >>>> >> >>> >>>>>>>> me is
> >>>> >> >>> >>>>>>>> we
> >>>> >> >>> >>>>>>>> should
> >>>> >> >>> >>>>>>>> create a design doc template for the project and ask
> >>>> >> >>> >>>>>>>> everybody
> >>>> >> >>> >>>>>>>> to
> >>>> >> >>> >>>>>>>> follow.
> >>>> >> >>> >>>>>>>> The design doc template should also explicitly list
> >>>> >> >>> >>>>>>>> goals and
> >>>> >> >>> >>>>>>>> non-goals, to
> >>>> >> >>> >>>>>>>> make design doc more consistent.
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have
> some
> >>>> >> >>> >>>>>>>> this
> >>>> >> >>> >>>>>>>> with
> >>>> >> >>> >>>>>>>> some
> >>>> >> >>> >>>>>>>> changes, but again very inconsistent. Just posting
> >>>> >> >>> >>>>>>>> something
> >>>> >> >>> >>>>>>>> on
> >>>> >> >>> >>>>>>>> JIRA
> >>>> >> >>> >>>>>>>> isn't
> >>>> >> >>> >>>>>>>> sufficient, because there are simply too many JIRAs
> and
> >>>> >> >>> >>>>>>>> the
> >>>> >> >>> >>>>>>>> signal
> >>>> >> >>> >>>>>>>> get lost
> >>>> >> >>> >>>>>>>> in the noise. While this is generally impossible to
> >>>> >> >>> >>>>>>>> enforce
> >>>> >> >>> >>>>>>>> because
> >>>> >> >>> >>>>>>>> we can't
> >>>> >> >>> >>>>>>>> force all volunteers to conform to a process (or they
> >>>> >> >>> >>>>>>>> might
> >>>> >> >>> >>>>>>>> not
> >>>> >> >>> >>>>>>>> even
> >>>> >> >>> >>>>>>>> be
> >>>> >> >>> >>>>>>>> aware of this),  those who are more familiar with the
> >>>> >> >>> >>>>>>>> project
> >>>> >> >>> >>>>>>>> can
> >>>> >> >>> >>>>>>>> help by
> >>>> >> >>> >>>>>>>> emailing the dev@ when they see something that hasn't
> >>>> >> >>> >>>>>>>> been.
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> - Culture: The design doc author(s) should be open to
> >>>> >> >>> >>>>>>>> feedback.
> >>>> >> >>> >>>>>>>> A
> >>>> >> >>> >>>>>>>> design
> >>>> >> >>> >>>>>>>> doc should serve as the base for discussion and is by
> no
> >>>> >> >>> >>>>>>>> means
> >>>> >> >>> >>>>>>>> the
> >>>> >> >>> >>>>>>>> final
> >>>> >> >>> >>>>>>>> design. Of course, this does not mean the author has
> to
> >>>> >> >>> >>>>>>>> accept
> >>>> >> >>> >>>>>>>> every
> >>>> >> >>> >>>>>>>> feedback. They should also be comfortable accepting /
> >>>> >> >>> >>>>>>>> rejecting
> >>>> >> >>> >>>>>>>> ideas on
> >>>> >> >>> >>>>>>>> technical grounds.
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it
> can
> >>>> >> >>> >>>>>>>> be
> >>>> >> >>> >>>>>>>> useful
> >>>> >> >>> >>>>>>>> to
> >>>> >> >>> >>>>>>>> have
> >>>> >> >>> >>>>>>>> some monthly Google hangouts that are open to the
> world.
> >>>> >> >>> >>>>>>>> I am
> >>>> >> >>> >>>>>>>> actually not
> >>>> >> >>> >>>>>>>> sure how well this will work, because of the
> >>>> >> >>> >>>>>>>> volunteering
> >>>> >> >>> >>>>>>>> nature
> >>>> >> >>> >>>>>>>> and
> >>>> >> >>> >>>>>>>> we need
> >>>> >> >>> >>>>>>>> to adjust for timezones for people across the globe,
> but
> >>>> >> >>> >>>>>>>> it
> >>>> >> >>> >>>>>>>> seems
> >>>> >> >>> >>>>>>>> worth
> >>>> >> >>> >>>>>>>> trying.
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> - Culture: Contributors (including committers) should
> be
> >>>> >> >>> >>>>>>>> more
> >>>> >> >>> >>>>>>>> direct
> >>>> >> >>> >>>>>>>> in
> >>>> >> >>> >>>>>>>> setting expectations, including whether they are
> working
> >>>> >> >>> >>>>>>>> on a
> >>>> >> >>> >>>>>>>> specific
> >>>> >> >>> >>>>>>>> issue, whether they will be working on a specific
> issue,
> >>>> >> >>> >>>>>>>> and
> >>>> >> >>> >>>>>>>> whether
> >>>> >> >>> >>>>>>>> an
> >>>> >> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I
> >>>> >> >>> >>>>>>>> know in
> >>>> >> >>> >>>>>>>> this
> >>>> >> >>> >>>>>>>> community
> >>>> >> >>> >>>>>>>> are nice and don't enjoy telling other people no, but
> it
> >>>> >> >>> >>>>>>>> is
> >>>> >> >>> >>>>>>>> often
> >>>> >> >>> >>>>>>>> more
> >>>> >> >>> >>>>>>>> annoying to a contributor to not know anything than
> >>>> >> >>> >>>>>>>> getting a
> >>>> >> >>> >>>>>>>> no.
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
> >>>> >> >>> >>>>>>>> <[hidden email]>
> >>>> >> >>> >>>>>>>> wrote:
> >>>> >> >>> >>>>>>>>>
> >>>> >> >>> >>>>>>>>>
> >>>> >> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement
> >>>> >> >>> >>>>>>>>> Proposal"
> >>>> >> >>> >>>>>>>>> process that
> >>>> >> >>> >>>>>>>>> solicits user input on new APIs. For what it's
> worth, I
> >>>> >> >>> >>>>>>>>> don't
> >>>> >> >>> >>>>>>>>> think
> >>>> >> >>> >>>>>>>>> committers are trying to minimize their own work --
> >>>> >> >>> >>>>>>>>> every
> >>>> >> >>> >>>>>>>>> committer
> >>>> >> >>> >>>>>>>>> cares
> >>>> >> >>> >>>>>>>>> about making the software useful for users. However,
> it
> >>>> >> >>> >>>>>>>>> is
> >>>> >> >>> >>>>>>>>> always
> >>>> >> >>> >>>>>>>>> hard to
> >>>> >> >>> >>>>>>>>> get user input and so it helps to have this kind of
> >>>> >> >>> >>>>>>>>> process.
> >>>> >> >>> >>>>>>>>> I've
> >>>> >> >>> >>>>>>>>> certainly
> >>>> >> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just
> >>>> >> >>> >>>>>>>>> to see
> >>>> >> >>> >>>>>>>>> the
> >>>> >> >>> >>>>>>>>> biggest
> >>>> >> >>> >>>>>>>>> things on the roadmap.
> >>>> >> >>> >>>>>>>>>
> >>>> >> >>> >>>>>>>>> When you're talking about "changing interfaces", are
> >>>> >> >>> >>>>>>>>> you
> >>>> >> >>> >>>>>>>>> talking
> >>>> >> >>> >>>>>>>>> about
> >>>> >> >>> >>>>>>>>> public or internal APIs? I do think many people hate
> >>>> >> >>> >>>>>>>>> changing
> >>>> >> >>> >>>>>>>>> public APIs
> >>>> >> >>> >>>>>>>>> and I actually think that's for the best of the
> >>>> >> >>> >>>>>>>>> project.
> >>>> >> >>> >>>>>>>>> That's
> >>>> >> >>> >>>>>>>>> a
> >>>> >> >>> >>>>>>>>> technical
> >>>> >> >>> >>>>>>>>> debate, but basically, the worst thing when you're
> >>>> >> >>> >>>>>>>>> using a
> >>>> >> >>> >>>>>>>>> piece
> >>>> >> >>> >>>>>>>>> of
> >>>> >> >>> >>>>>>>>> software
> >>>> >> >>> >>>>>>>>> is that the developers constantly ask you to rewrite
> >>>> >> >>> >>>>>>>>> your
> >>>> >> >>> >>>>>>>>> app
> >>>> >> >>> >>>>>>>>> to
> >>>> >> >>> >>>>>>>>> update to a
> >>>> >> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc).
> Cue
> >>>> >> >>> >>>>>>>>> anyone
> >>>> >> >>> >>>>>>>>> who's used
> >>>> >> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change
> >>>> >> >>> >>>>>>>>> their
> >>>> >> >>> >>>>>>>>> code
> >>>> >> >>> >>>>>>>>> this
> >>>> >> >>> >>>>>>>>> release" model works well within a single large
> >>>> >> >>> >>>>>>>>> company, but
> >>>> >> >>> >>>>>>>>> doesn't work
> >>>> >> >>> >>>>>>>>> well for a community, which is why nearly all *very*
> >>>> >> >>> >>>>>>>>> widely
> >>>> >> >>> >>>>>>>>> used
> >>>> >> >>> >>>>>>>>> programming
> >>>> >> >>> >>>>>>>>> interfaces (I'm talking things like Java standard
> >>>> >> >>> >>>>>>>>> library,
> >>>> >> >>> >>>>>>>>> Windows
> >>>> >> >>> >>>>>>>>> API, etc)
> >>>> >> >>> >>>>>>>>> almost *never* break backwards compatibility. All
> this
> >>>> >> >>> >>>>>>>>> is
> >>>> >> >>> >>>>>>>>> done
> >>>> >> >>> >>>>>>>>> within reason
> >>>> >> >>> >>>>>>>>> though, e.g. we do change things in major releases
> >>>> >> >>> >>>>>>>>> (2.x,
> >>>> >> >>> >>>>>>>>> 3.x,
> >>>> >> >>> >>>>>>>>> etc).
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>>
> >>>> >> >>> >>>>>>>
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>> ------------------------------
> ---------------------------------------
> >>>> >> >>> >>>>>> To unsubscribe e-mail: [hidden email]
> >>>> >> >>> >>>>>>
> >>>> >> >>> >>>>>
> >>>> >> >>> >>>>>
> >>>> >> >>> >>>>>
> >>>> >> >>> >>>>> --
> >>>> >> >>> >>>>> Stavros Kontopoulos
> >>>> >> >>> >>>>> Senior Software Engineer
> >>>> >> >>> >>>>> Lightbend, Inc.
> >>>> >> >>> >>>>> p:  +30 6977967274
> >>>> >> >>> >>>>> e: [hidden email]
> >>>> >> >>> >>>>>
> >>>> >> >>> >>>>>
> >>>> >> >>> >>>>
> >>>> >> >>> >>>
> >>>> >> >>> >>
> >>>> >> >>> >>
> >>>> >> >>>
> >>>> >> >>
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > ------------------------------------------------------------
> ---------
> >>>> >> > To unsubscribe e-mail: [hidden email]
> >>>> >> >
> >>>> >> >
> >>>> >> > ________________________________
> >>>> >> >
> >>>> >> > If you reply to this email, your message will be added to the
> >>>> >> > discussion
> >>>> >> > below:
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > http://apache-spark-developers-list.1001551.n3.
> nabble.com/Spark-Improvement-Proposals-tp19268p19359.html
> >>>> >> >
> >>>> >> > To start a new topic under Apache Spark Developers List, email
> >>>> >> > [hidden
> >>>> >> > email]
> >>>> >> > To unsubscribe from Apache Spark Developers List, click here.
> >>>> >> > NAML
> >>>> >> >
> >>>> >> >
> >>>> >> > ________________________________
> >>>> >> > View this message in context: RE: Spark Improvement Proposals
> >>>> >> > Sent from the Apache Spark Developers List mailing list archive
> at
> >>>> >> > Nabble.com.
> >>>> >>
> >>>> >> ------------------------------------------------------------
> ---------
> >>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>> >>
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Ryan Blue
> >>>> > Software Engineer
> >>>> > Netflix
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>>
> >>>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Spark Improvement Proposals

Reply via email to