Re: Spark Improvement Proposals

kant kodali Wed, 12 Oct 2016 20:31:27 -0700

Some of you guys may have already seen this but in case if you haven't you
may want to check it out.


http://www.slideshare.net/sbaltagi/flink-vs-spark



On Tue, Oct 11, 2016 at 1:57 PM, Ryan Blue <rb...@netflix.com.invalid>
wrote:

> I don't think we will have trouble with whatever rule that is adopted for
> accepting proposals. Considering committers' votes binding (if that is what
> we choose) is an established practice as long as it isn't for specific
> votes, like a release vote. From the Apache docs: "Who is permitted to vote
> is, to some extent, a community-specific thing." [1] And, I also don't see
> why it would be a problem to choose consensus, as long as we have an open
> discussion and vote about these rules.
>
> rb
>
> On Mon, Oct 10, 2016 at 4:15 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> If someone wants to tell me that it's OK and "The Apache Way" for
>> Kafka and Flink to have a proposal process that ends in a lazy
>> majority, but it's not OK for Spark to have a proposal process that
>> ends in a non-lazy consensus...
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+
>> Improvement+Proposals#KafkaImprovementProposals-Process
>>
>> In practice any PMC member can stop a proposal they don't like, so I'm
>> not sure how much it matters.
>>
>>
>>
>> On Mon, Oct 10, 2016 at 5:59 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>> > There is a larger issue to keep in mind, and that is that what you are
>> > proposing is a procedure that, as far as I am aware, hasn't previously
>> been
>> > adopted in an Apache project, and thus is not an easy or exact fit with
>> > established practices that have been blessed as "The Apache Way".  As
>> such,
>> > we need to be careful, because we have run into some trouble in the past
>> > with some inside the ASF but essentially outside the Spark community who
>> > didn't like the way we were doing things.
>> >
>> > On Mon, Oct 10, 2016 at 3:53 PM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>> >>
>> >> Apache documents say lots of confusing stuff, including that commiters
>> are
>> >> in practice given a vote.
>> >>
>> >> https://www.apache.org/foundation/voting.html
>> >>
>> >> I don't care either way, if someone wants me to sub commiter for PMC in
>> >> the voting section, fine, we just need a clear outcome.
>> >>
>> >>
>> >> On Oct 10, 2016 17:36, "Mark Hamstra" <m...@clearstorydata.com> wrote:
>> >>>
>> >>> If I'm correctly understanding the kind of voting that you are talking
>> >>> about, then to be accurate, it is only the PMC members that have a
>> vote, not
>> >>> all committers:
>> >>> https://www.apache.org/foundation/how-it-works.html#pmc-members
>> >>>
>> >>> On Mon, Oct 10, 2016 at 12:02 PM, Cody Koeninger <c...@koeninger.org>
>> >>> wrote:
>> >>>>
>> >>>> I think the main value is in being honest about what's going on.  No
>> >>>> one other than committers can cast a meaningful vote, that's the
>> >>>> reality.  Beyond that, if people think it's more open to allow formal
>> >>>> proposals from anyone, I'm not necessarily against it, but my main
>> >>>> question would be this:
>> >>>>
>> >>>> If anyone can submit a proposal, are committers actually going to
>> >>>> clearly reject and close proposals that don't meet the requirements?
>> >>>>
>> >>>> Right now we have a serious problem with lack of clarity regarding
>> >>>> contributions, and that cannot spill over into goal-setting.
>> >>>>
>> >>>> On Mon, Oct 10, 2016 at 1:54 PM, Ryan Blue <rb...@netflix.com>
>> wrote:
>> >>>> > +1 to votes to approve proposals. I agree that proposals should
>> have
>> >>>> > an
>> >>>> > official mechanism to be accepted, and a vote is an established
>> means
>> >>>> > of
>> >>>> > doing that well. I like that it includes a period to review the
>> >>>> > proposal and
>> >>>> > I think proposals should have been discussed enough ahead of a
>> vote to
>> >>>> > survive the possibility of a veto.
>> >>>> >
>> >>>> > I also like the names that are short and (mostly) unique, like SEP.
>> >>>> >
>> >>>> > Where I disagree is with the requirement that a committer must
>> >>>> > formally
>> >>>> > propose an enhancement. I don't see the value of restricting this:
>> if
>> >>>> > someone has the will to write up a proposal then they should be
>> >>>> > encouraged
>> >>>> > to do so and start a discussion about it. Even if there is a
>> political
>> >>>> > reality as Cody says, what is the value of codifying that in our
>> >>>> > process? I
>> >>>> > think restricting who can submit proposals would only undermine
>> them
>> >>>> > by
>> >>>> > pushing contributors out. Maybe I'm missing something here?
>> >>>> >
>> >>>> > rb
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <
>> c...@koeninger.org>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> Yes, users suggesting SIPs is a good thing and is explicitly
>> called
>> >>>> >> out in the linked document under the Who? section.  Formally
>> >>>> >> proposing
>> >>>> >> them, not so much, because of the political realities.
>> >>>> >>
>> >>>> >> Yes, implementation strategy definitely affects goals.  There are
>> all
>> >>>> >> kinds of examples of this, I'll pick one that's my fault so as to
>> >>>> >> avoid sounding like I'm blaming:
>> >>>> >>
>> >>>> >> When I implemented the Kafka DStream, one of my (not explicitly
>> >>>> >> agreed
>> >>>> >> upon by the community) goals was to make sure people could use the
>> >>>> >> Dstream with however they were already using Kafka at work.  The
>> lack
>> >>>> >> of explicit agreement on that goal led to all kinds of fighting
>> with
>> >>>> >> committers, that could have been avoided.  The lack of explicit
>> >>>> >> up-front strategy discussion led to the DStream not really working
>> >>>> >> with compacted topics.  I knew about compacted topics, but don't
>> have
>> >>>> >> a use for them, so had a blind spot there.  If there was explicit
>> >>>> >> up-front discussion that my strategy was "assume that batches can
>> be
>> >>>> >> defined on the driver solely by beginning and ending offsets",
>> >>>> >> there's
>> >>>> >> a greater chance that a user would have seen that and said, "hey,
>> >>>> >> what
>> >>>> >> about non-contiguous offsets in a compacted topic".
>> >>>> >>
>> >>>> >> This kind of thing is only going to happen smoothly if we have a
>> >>>> >> lightweight user-visible process with clear outcomes.
>> >>>> >>
>> >>>> >> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
>> >>>> >> <assaf.mendel...@rsa.com> wrote:
>> >>>> >> > I agree with most of what Cody said.
>> >>>> >> >
>> >>>> >> > Two things:
>> >>>> >> >
>> >>>> >> > First we can always have other people suggest SIPs but mark
>> them as
>> >>>> >> > “unreviewed” and have committers basically move them forward.
>> The
>> >>>> >> > problem is
>> >>>> >> > that writing a good document takes time. This way we can
>> leverage
>> >>>> >> > non
>> >>>> >> > committers to do some of this work (it is just another way to
>> >>>> >> > contribute).
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > As for strategy, in many cases implementation strategy can
>> affect
>> >>>> >> > the
>> >>>> >> > goals.
>> >>>> >> > I will give  a small example: In the current structured
>> streaming
>> >>>> >> > strategy,
>> >>>> >> > we group by the time to achieve a sliding window. This is
>> >>>> >> > definitely an
>> >>>> >> > implementation decision and not a goal. However, I can think of
>> >>>> >> > several
>> >>>> >> > aggregation functions which have the time inside their
>> calculation
>> >>>> >> > buffer.
>> >>>> >> > For example, let’s say we want to return a set of all distinct
>> >>>> >> > values.
>> >>>> >> > One
>> >>>> >> > way to implement this would be to make the set into a map and
>> have
>> >>>> >> > the
>> >>>> >> > value
>> >>>> >> > contain the last time seen. Multiplying it across the groupby
>> would
>> >>>> >> > cost
>> >>>> >> > a
>> >>>> >> > lot in performance. So adding such a strategy would have a great
>> >>>> >> > effect
>> >>>> >> > on
>> >>>> >> > the type of aggregations and their performance which does affect
>> >>>> >> > the
>> >>>> >> > goal.
>> >>>> >> > Without adding the strategy, it is easy for whoever goes to the
>> >>>> >> > design
>> >>>> >> > document to not think about these cases. Furthermore, it might
>> be
>> >>>> >> > decided
>> >>>> >> > that these cases are rare enough so that the strategy is still
>> good
>> >>>> >> > enough
>> >>>> >> > but how would we know it without user feedback?
>> >>>> >> >
>> >>>> >> > I believe this example is exactly what Cody was talking about.
>> >>>> >> > Since
>> >>>> >> > many
>> >>>> >> > times implementation strategies have a large effect on the
>> goal, we
>> >>>> >> > should
>> >>>> >> > have it discussed when discussing the goals. In addition, while
>> it
>> >>>> >> > is
>> >>>> >> > often
>> >>>> >> > easy to throw out completely infeasible goals, it is often much
>> >>>> >> > harder
>> >>>> >> > to
>> >>>> >> > figure out that the goals are unfeasible without fine tuning.
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Assaf.
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > From: Cody Koeninger-2 [via Apache Spark Developers List]
>> >>>> >> > [mailto:ml-node+[hidden email]]
>> >>>> >> > Sent: Monday, October 10, 2016 2:25 AM
>> >>>> >> > To: Mendelson, Assaf
>> >>>> >> > Subject: Re: Spark Improvement Proposals
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Only committers should formally submit SIPs because in an apache
>> >>>> >> > project only commiters have explicit political power.  If a user
>> >>>> >> > can't
>> >>>> >> > find a commiter willing to sponsor an SIP idea, they have no
>> way to
>> >>>> >> > get the idea passed in any case.  If I can't find a committer to
>> >>>> >> > sponsor this meta-SIP idea, I'm out of luck.
>> >>>> >> >
>> >>>> >> > I do not believe unrealistic goals can be found solely by
>> >>>> >> > inspection.
>> >>>> >> > We've managed to ignore unrealistic goals even after
>> >>>> >> > implementation!
>> >>>> >> > Focusing on APIs can allow people to think they've solved
>> >>>> >> > something,
>> >>>> >> > when there's really no way of implementing that API while
>> meeting
>> >>>> >> > the
>> >>>> >> > goals.  Rapid iteration is clearly the best way to address this,
>> >>>> >> > but
>> >>>> >> > we've already talked about why that hasn't really worked.  If
>> >>>> >> > adding a
>> >>>> >> > non-binding API section to the template is important to you, I'm
>> >>>> >> > not
>> >>>> >> > against it, but I don't think it's sufficient.
>> >>>> >> >
>> >>>> >> > On your PRD vs design doc spectrum, I'm saying this is closer
>> to a
>> >>>> >> > PRD.  Clear agreement on goals is the most important thing and
>> >>>> >> > that's
>> >>>> >> > why it's the thing I want binding agreement on.  But I cannot
>> agree
>> >>>> >> > to
>> >>>> >> > goals unless I have enough minimal technical info to judge
>> whether
>> >>>> >> > the
>> >>>> >> > goals are likely to actually be accomplished.
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]>
>> >>>> >> > wrote:
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >> Well, I think there are a few things here that don't make
>> sense.
>> >>>> >> >> First,
>> >>>> >> >> why
>> >>>> >> >> should only committers submit SIPs? Development in the project
>> >>>> >> >> should
>> >>>> >> >> be
>> >>>> >> >> open to all contributors, whether they're committers or not.
>> >>>> >> >> Second, I
>> >>>> >> >> think
>> >>>> >> >> unrealistic goals can be found just by inspecting the goals,
>> and
>> >>>> >> >> I'm
>> >>>> >> >> not
>> >>>> >> >> super worried that we'll accept a lot of SIPs that are then
>> >>>> >> >> infeasible
>> >>>> >> >> --
>> >>>> >> >> we
>> >>>> >> >> can then submit new ones. But this depends on whether you want
>> >>>> >> >> this
>> >>>> >> >> process
>> >>>> >> >> to be a "design doc lite", where people also agree on
>> >>>> >> >> implementation
>> >>>> >> >> strategy, or just a way to agree on goals. This is what I asked
>> >>>> >> >> earlier
>> >>>> >> >> about PRDs vs design docs (and I'm open to either one but I'd
>> just
>> >>>> >> >> like
>> >>>> >> >> clarity). Finally, both as a user and designer of software, I
>> >>>> >> >> always
>> >>>> >> >> want
>> >>>> >> >> to
>> >>>> >> >> give feedback on APIs, so I'd really like a culture of having
>> >>>> >> >> those
>> >>>> >> >> early.
>> >>>> >> >> People don't argue about prettiness when they discuss APIs,
>> they
>> >>>> >> >> argue
>> >>>> >> >> about
>> >>>> >> >> the core concepts to expose in order to meet various goals, and
>> >>>> >> >> then
>> >>>> >> >> they're
>> >>>> >> >> stuck maintaining those for a long time.
>> >>>> >> >>
>> >>>> >> >> Matei
>> >>>> >> >>
>> >>>> >> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]>
>> wrote:
>> >>>> >> >>
>> >>>> >> >> Users instead of people, sure.  Commiters and contributors are
>> (or
>> >>>> >> >> at
>> >>>> >> >> least
>> >>>> >> >> should be) a subset of users.
>> >>>> >> >>
>> >>>> >> >> Non goals, sure. I don't care what the name is, but we need to
>> >>>> >> >> clearly
>> >>>> >> >> say
>> >>>> >> >> e.g. 'no we are not maintaining compatibility with XYZ right
>> now'.
>> >>>> >> >>
>> >>>> >> >> API, what I care most about is whether it allows me to
>> accomplish
>> >>>> >> >> the
>> >>>> >> >> goals.
>> >>>> >> >> Arguing about how ugly or pretty it is can be saved for design/
>> >>>> >> >> implementation imho.
>> >>>> >> >>
>> >>>> >> >> Strategy, this is necessary because otherwise goals can be out
>> of
>> >>>> >> >> line
>> >>>> >> >> with
>> >>>> >> >> reality.  Don't propose goals you don't have at least some
>> idea of
>> >>>> >> >> how
>> >>>> >> >> to
>> >>>> >> >> implement.
>> >>>> >> >>
>> >>>> >> >> Rejected strategies, given that commiters are the only ones I'm
>> >>>> >> >> saying
>> >>>> >> >> should formally submit SPARKLIs or SIPs, if they put junk in a
>> >>>> >> >> required
>> >>>> >> >> section then slap them down for it and tell them to fix it.
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>> >>>> >> >>>
>> >>>> >> >>> Yup, this is the stuff that I found unclear. Thanks for
>> >>>> >> >>> clarifying
>> >>>> >> >>> here,
>> >>>> >> >>> but we should also clarify it in the writeup. In particular:
>> >>>> >> >>>
>> >>>> >> >>> - Goals needs to be about user-facing behavior ("people" is
>> >>>> >> >>> broad)
>> >>>> >> >>>
>> >>>> >> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone
>> will
>> >>>> >> >>> dig
>> >>>> >> >>> up
>> >>>> >> >>> one of these and say "Spark's developers have officially
>> rejected
>> >>>> >> >>> X,
>> >>>> >> >>> which
>> >>>> >> >>> our awesome system has".
>> >>>> >> >>>
>> >>>> >> >>> - For user-facing stuff, I think you need a section on API.
>> >>>> >> >>> Virtually
>> >>>> >> >>> all
>> >>>> >> >>> other *IPs I've seen have that.
>> >>>> >> >>>
>> >>>> >> >>> - I'm still not sure why the strategy section is needed if the
>> >>>> >> >>> purpose
>> >>>> >> >>> is
>> >>>> >> >>> to define user-facing behavior -- unless this is the strategy
>> for
>> >>>> >> >>> setting
>> >>>> >> >>> the goals or for defining the API. That sounds squarely like a
>> >>>> >> >>> design
>> >>>> >> >>> doc
>> >>>> >> >>> issue. In some sense, who cares whether the proposal is
>> >>>> >> >>> technically
>> >>>> >> >>> feasible
>> >>>> >> >>> right now? If it's infeasible, that will be discovered later
>> >>>> >> >>> during
>> >>>> >> >>> design
>> >>>> >> >>> and implementation. Same thing with rejected strategies --
>> >>>> >> >>> listing
>> >>>> >> >>> some
>> >>>> >> >>> of
>> >>>> >> >>> those is definitely useful sometimes, but if you make this a
>> >>>> >> >>> *required*
>> >>>> >> >>> section, people are just going to fill it in with bogus stuff
>> >>>> >> >>> (I've
>> >>>> >> >>> seen
>> >>>> >> >>> this happen before).
>> >>>> >> >>>
>> >>>> >> >>> Matei
>> >>>> >> >>>
>> >>>> >> >
>> >>>> >> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]>
>> >>>> >> >>> > wrote:
>> >>>> >> >>> >
>> >>>> >> >>> > So to focus the discussion on the specific strategy I'm
>> >>>> >> >>> > suggesting,
>> >>>> >> >>> > documented at
>> >>>> >> >>> >
>> >>>> >> >>> >
>> >>>> >> >>> >
>> >>>> >> >>> >
>> >>>> >> >>> >
>> >>>> >> >>> > https://github.com/koeninger/s
>> park-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >>>> >> >>> >
>> >>>> >> >>> > "Goals: What must this allow people to do, that they can't
>> >>>> >> >>> > currently?"
>> >>>> >> >>> >
>> >>>> >> >>> > Is it unclear that this is focusing specifically on
>> >>>> >> >>> > people-visible
>> >>>> >> >>> > behavior?
>> >>>> >> >>> >
>> >>>> >> >>> > Rejected goals -  are important because otherwise people
>> keep
>> >>>> >> >>> > trying
>> >>>> >> >>> > to argue about scope.  Of course you can change things later
>> >>>> >> >>> > with a
>> >>>> >> >>> > different SIP and different vote, the point is to focus.
>> >>>> >> >>> >
>> >>>> >> >>> > Use cases - are something that people are going to bring up
>> in
>> >>>> >> >>> > discussion.  If they aren't clearly documented as a goal
>> ("This
>> >>>> >> >>> > must
>> >>>> >> >>> > allow me to connect using SSL"), they should be added.
>> >>>> >> >>> >
>> >>>> >> >>> > Internal architecture - if the people who need specific
>> >>>> >> >>> > behavior are
>> >>>> >> >>> > implementers of other parts of the system, that's fine.
>> >>>> >> >>> >
>> >>>> >> >>> > Rejected strategies - If you have none of these, you have no
>> >>>> >> >>> > evidence
>> >>>> >> >>> > that the proponent didn't just go with the first thing they
>> had
>> >>>> >> >>> > in
>> >>>> >> >>> > mind (or have already implemented), which is a big problem
>> >>>> >> >>> > currently.
>> >>>> >> >>> > Approval isn't binding as to specifics of implementation, so
>> >>>> >> >>> > these
>> >>>> >> >>> > aren't handcuffs.  The goals are the contract, the strategy
>> is
>> >>>> >> >>> > evidence that contract can actually be met.
>> >>>> >> >>> >
>> >>>> >> >>> > Design docs - I'm not touching design docs.  The markdown
>> file
>> >>>> >> >>> > I
>> >>>> >> >>> > linked specifically says of the strategy section "This is
>> not a
>> >>>> >> >>> > full
>> >>>> >> >>> > design document."  Is this unclear?  Design docs can be
>> worked
>> >>>> >> >>> > on
>> >>>> >> >>> > obviously, but that's not what I'm concerned with here.
>> >>>> >> >>> >
>> >>>> >> >>> >
>> >>>> >> >>> >
>> >>>> >> >>> >
>> >>>> >> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden
>> email]>
>> >>>> >> >>> > wrote:
>> >>>> >> >>> >> Hi Cody,
>> >>>> >> >>> >>
>> >>>> >> >>> >> I think this would be a lot more concrete if we had a more
>> >>>> >> >>> >> detailed
>> >>>> >> >>> >> template
>> >>>> >> >>> >> for SIPs. Right now, it's not super clear what's in scope
>> --
>> >>>> >> >>> >> e.g.
>> >>>> >> >>> >> are
>> >>>> >> >>> >> they
>> >>>> >> >>> >> a way to solicit feedback on the user-facing behavior or on
>> >>>> >> >>> >> the
>> >>>> >> >>> >> internals?
>> >>>> >> >>> >> "Goals" can cover both things. I've been thinking of SIPs
>> more
>> >>>> >> >>> >> as
>> >>>> >> >>> >> Product
>> >>>> >> >>> >> Requirements Docs (PRDs), which focus on *what* a code
>> change
>> >>>> >> >>> >> should
>> >>>> >> >>> >> do
>> >>>> >> >>> >> as
>> >>>> >> >>> >> opposed to how.
>> >>>> >> >>> >>
>> >>>> >> >>> >> In particular, here are some things that you may or may not
>> >>>> >> >>> >> consider
>> >>>> >> >>> >> in
>> >>>> >> >>> >> scope for SIPs:
>> >>>> >> >>> >>
>> >>>> >> >>> >> - Goals and non-goals: This is definitely in scope, and IMO
>> >>>> >> >>> >> should
>> >>>> >> >>> >> focus on
>> >>>> >> >>> >> user-visible behavior (e.g. "system supports SQL window
>> >>>> >> >>> >> functions"
>> >>>> >> >>> >> or
>> >>>> >> >>> >> "system continues working if one node fails"). BTW I
>> wouldn't
>> >>>> >> >>> >> say
>> >>>> >> >>> >> "rejected
>> >>>> >> >>> >> goals" because some of them might become goals later, so
>> we're
>> >>>> >> >>> >> not
>> >>>> >> >>> >> definitively rejecting them.
>> >>>> >> >>> >>
>> >>>> >> >>> >> - Public API: Probably should be included in most SIPs
>> unless
>> >>>> >> >>> >> it's
>> >>>> >> >>> >> too
>> >>>> >> >>> >> large
>> >>>> >> >>> >> to fully specify then (e.g. "let's add an ML library").
>> >>>> >> >>> >>
>> >>>> >> >>> >> - Use cases: I usually find this very useful in PRDs to
>> better
>> >>>> >> >>> >> communicate
>> >>>> >> >>> >> the goals.
>> >>>> >> >>> >>
>> >>>> >> >>> >> - Internal architecture: This is usually *not* a thing
>> users
>> >>>> >> >>> >> can
>> >>>> >> >>> >> easily
>> >>>> >> >>> >> comment on and it sounds more like a design doc item. Of
>> >>>> >> >>> >> course
>> >>>> >> >>> >> it's
>> >>>> >> >>> >> important to show that the SIP is feasible to implement.
>> One
>> >>>> >> >>> >> exception,
>> >>>> >> >>> >> however, is that I think we'll have some SIPs primarily on
>> >>>> >> >>> >> internals
>> >>>> >> >>> >> (e.g.
>> >>>> >> >>> >> if somebody wants to refactor Spark's query optimizer or
>> >>>> >> >>> >> something).
>> >>>> >> >>> >>
>> >>>> >> >>> >> - Rejected strategies: I personally wouldn't put this,
>> because
>> >>>> >> >>> >> what's
>> >>>> >> >>> >> the
>> >>>> >> >>> >> point of voting to reject a strategy before you've really
>> >>>> >> >>> >> begun
>> >>>> >> >>> >> designing
>> >>>> >> >>> >> and implementing something? What if you discover that the
>> >>>> >> >>> >> strategy
>> >>>> >> >>> >> is
>> >>>> >> >>> >> actually better when you start doing stuff?
>> >>>> >> >>> >>
>> >>>> >> >>> >> At a super high level, it depends on whether you want the
>> SIPs
>> >>>> >> >>> >> to
>> >>>> >> >>> >> be
>> >>>> >> >>> >> PRDs
>> >>>> >> >>> >> for getting some quick feedback on the goals of a feature
>> >>>> >> >>> >> before it
>> >>>> >> >>> >> is
>> >>>> >> >>> >> designed, or something more like full-fledged design docs
>> >>>> >> >>> >> (just a
>> >>>> >> >>> >> more
>> >>>> >> >>> >> visible design doc for bigger changes). I looked at Kafka's
>> >>>> >> >>> >> KIPs,
>> >>>> >> >>> >> and
>> >>>> >> >>> >> they
>> >>>> >> >>> >> actually seem to be more like design docs. This can work
>> too
>> >>>> >> >>> >> but it
>> >>>> >> >>> >> does
>> >>>> >> >>> >> require more work from the proposer and it can lead to the
>> >>>> >> >>> >> same
>> >>>> >> >>> >> problems you
>> >>>> >> >>> >> mentioned with people already having a design and
>> >>>> >> >>> >> implementation in
>> >>>> >> >>> >> mind.
>> >>>> >> >>> >>
>> >>>> >> >>> >> Basically, the question is, are you trying to iterate
>> faster
>> >>>> >> >>> >> on
>> >>>> >> >>> >> design
>> >>>> >> >>> >> by
>> >>>> >> >>> >> adding a step for user feedback earlier? Or are you just
>> >>>> >> >>> >> trying to
>> >>>> >> >>> >> make
>> >>>> >> >>> >> design docs for key features more visible (and their
>> approval
>> >>>> >> >>> >> more
>> >>>> >> >>> >> formal)?
>> >>>> >> >>> >>
>> >>>> >> >>> >> BTW note that in either case, I'd like to have a template
>> for
>> >>>> >> >>> >> design
>> >>>> >> >>> >> docs
>> >>>> >> >>> >> too, which should also include goals. I think that would've
>> >>>> >> >>> >> avoided
>> >>>> >> >>> >> some of
>> >>>> >> >>> >> the issues you brought up.
>> >>>> >> >>> >>
>> >>>> >> >>> >> Matei
>> >>>> >> >>> >>
>> >>>> >> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden
>> email]>
>> >>>> >> >>> >> wrote:
>> >>>> >> >>> >>
>> >>>> >> >>> >> Here's my specific proposal (meta-proposal?)
>> >>>> >> >>> >>
>> >>>> >> >>> >> Spark Improvement Proposals (SIP)
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >> Background:
>> >>>> >> >>> >>
>> >>>> >> >>> >> The current problem is that design and implementation of
>> large
>> >>>> >> >>> >> features
>> >>>> >> >>> >> are
>> >>>> >> >>> >> often done in private, before soliciting user feedback.
>> >>>> >> >>> >>
>> >>>> >> >>> >> When feedback is solicited, it is often as to detailed
>> design
>> >>>> >> >>> >> specifics, not
>> >>>> >> >>> >> focused on goals.
>> >>>> >> >>> >>
>> >>>> >> >>> >> When implementation does take place after design, there is
>> >>>> >> >>> >> often
>> >>>> >> >>> >> disagreement as to what goals are or are not in scope.
>> >>>> >> >>> >>
>> >>>> >> >>> >> This results in commits that don't fully meet user needs.
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >> Goals:
>> >>>> >> >>> >>
>> >>>> >> >>> >> - Ensure user, contributor, and committer goals are clearly
>> >>>> >> >>> >> identified
>> >>>> >> >>> >> and
>> >>>> >> >>> >> agreed upon, before implementation takes place.
>> >>>> >> >>> >>
>> >>>> >> >>> >> - Ensure that a technically feasible strategy is chosen
>> that
>> >>>> >> >>> >> is
>> >>>> >> >>> >> likely
>> >>>> >> >>> >> to
>> >>>> >> >>> >> meet the goals.
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >> Rejected Goals:
>> >>>> >> >>> >>
>> >>>> >> >>> >> - SIPs are not for detailed design.  Design by committee
>> >>>> >> >>> >> doesn't
>> >>>> >> >>> >> work.
>> >>>> >> >>> >>
>> >>>> >> >>> >> - SIPs are not for every change.  We dont need that much
>> >>>> >> >>> >> process.
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >> Strategy:
>> >>>> >> >>> >>
>> >>>> >> >>> >> My suggestion is outlined as a Spark Improvement Proposal
>> >>>> >> >>> >> process
>> >>>> >> >>> >> documented
>> >>>> >> >>> >> at
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >> https://github.com/koeninger/s
>> park-1/blob/SIP-0/docs/spark-improvement-proposals.md
>> >>>> >> >>> >>
>> >>>> >> >>> >> Specifics of Jira manipulation are an implementation
>> detail we
>> >>>> >> >>> >> can
>> >>>> >> >>> >> figure
>> >>>> >> >>> >> out.
>> >>>> >> >>> >>
>> >>>> >> >>> >> I'm suggesting voting; the need here is for a _clear_
>> outcome.
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >> Rejected Strategies:
>> >>>> >> >>> >>
>> >>>> >> >>> >> Having someone who understands the problem implement it
>> first
>> >>>> >> >>> >> works,
>> >>>> >> >>> >> but
>> >>>> >> >>> >> only if significant iteration after user feedback is
>> allowed.
>> >>>> >> >>> >>
>> >>>> >> >>> >> Historically this has been problematic due to pressure to
>> >>>> >> >>> >> limit
>> >>>> >> >>> >> public
>> >>>> >> >>> >> api
>> >>>> >> >>> >> changes.
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden
>> email]>
>> >>>> >> >>> >> wrote:
>> >>>> >> >>> >>>
>> >>>> >> >>> >>> Alright looks like there are quite a bit of support. We
>> >>>> >> >>> >>> should
>> >>>> >> >>> >>> wait
>> >>>> >> >>> >>> to
>> >>>> >> >>> >>> hear from more people too.
>> >>>> >> >>> >>>
>> >>>> >> >>> >>> To push this forward, Cody and I will be working together
>> in
>> >>>> >> >>> >>> the
>> >>>> >> >>> >>> next
>> >>>> >> >>> >>> couple of weeks to come up with a concrete, detailed
>> proposal
>> >>>> >> >>> >>> on
>> >>>> >> >>> >>> what
>> >>>> >> >>> >>> this
>> >>>> >> >>> >>> entails, and then we can discuss this the specific
>> proposal
>> >>>> >> >>> >>> as
>> >>>> >> >>> >>> well.
>> >>>> >> >>> >>>
>> >>>> >> >>> >>>
>> >>>> >> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden
>> >>>> >> >>> >>> email]>
>> >>>> >> >>> >>> wrote:
>> >>>> >> >>> >>>>
>> >>>> >> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs
>> for
>> >>>> >> >>> >>>> major
>> >>>> >> >>> >>>> user-facing or cross-cutting changes, not minor feature
>> >>>> >> >>> >>>> adds.
>> >>>> >> >>> >>>>
>> >>>> >> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>> >>>> >> >>> >>>> <[hidden email]> wrote:
>> >>>> >> >>> >>>>>
>> >>>> >> >>> >>>>> +1 to the SIP label as long as it does not slow down
>> things
>> >>>> >> >>> >>>>> and
>> >>>> >> >>> >>>>> it
>> >>>> >> >>> >>>>> targets optimizing efforts, coordination etc. For
>> example
>> >>>> >> >>> >>>>> really
>> >>>> >> >>> >>>>> small
>> >>>> >> >>> >>>>> features should not need to go through this process
>> >>>> >> >>> >>>>> (assuming
>> >>>> >> >>> >>>>> they
>> >>>> >> >>> >>>>> dont
>> >>>> >> >>> >>>>> touch public interfaces)  or re-factorings and hope it
>> will
>> >>>> >> >>> >>>>> be
>> >>>> >> >>> >>>>> kept
>> >>>> >> >>> >>>>> this
>> >>>> >> >>> >>>>> way. So as a guideline doc should be provided, like in
>> the
>> >>>> >> >>> >>>>> KIP
>> >>>> >> >>> >>>>> case.
>> >>>> >> >>> >>>>>
>> >>>> >> >>> >>>>> IMHO so far aside from tagging things and linking them
>> >>>> >> >>> >>>>> elsewhere
>> >>>> >> >>> >>>>> simply
>> >>>> >> >>> >>>>> having design docs and prototypes implementations in
>> PRs is
>> >>>> >> >>> >>>>> not
>> >>>> >> >>> >>>>> something
>> >>>> >> >>> >>>>> that has not worked so far. What is really a pain in
>> many
>> >>>> >> >>> >>>>> projects
>> >>>> >> >>> >>>>> out there
>> >>>> >> >>> >>>>> is discontinuity in progress of PRs, missing features,
>> slow
>> >>>> >> >>> >>>>> reviews
>> >>>> >> >>> >>>>> which is
>> >>>> >> >>> >>>>> understandable to some extent... it is not only about
>> Spark
>> >>>> >> >>> >>>>> but
>> >>>> >> >>> >>>>> things can
>> >>>> >> >>> >>>>> be improved for sure for this project in particular as
>> >>>> >> >>> >>>>> already
>> >>>> >> >>> >>>>> stated.
>> >>>> >> >>> >>>>>
>> >>>> >> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden
>> >>>> >> >>> >>>>> email]>
>> >>>> >> >>> >>>>> wrote:
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>> +1 to adding an SIP label and linking it from the
>> website.
>> >>>> >> >>> >>>>>> I
>> >>>> >> >>> >>>>>> think
>> >>>> >> >>> >>>>>> it
>> >>>> >> >>> >>>>>> needs
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>> - template that focuses it towards soliciting user
>> goals /
>> >>>> >> >>> >>>>>> non
>> >>>> >> >>> >>>>>> goals
>> >>>> >> >>> >>>>>> - clear resolution as to which strategy was chosen to
>> >>>> >> >>> >>>>>> pursue.
>> >>>> >> >>> >>>>>> I'd
>> >>>> >> >>> >>>>>> recommend a vote.
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>> Matei asked me to clarify what I meant by changing
>> >>>> >> >>> >>>>>> interfaces,
>> >>>> >> >>> >>>>>> I
>> >>>> >> >>> >>>>>> think
>> >>>> >> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify
>> >>>> >> >>> >>>>>> here,
>> >>>> >> >>> >>>>>> and
>> >>>> >> >>> >>>>>> split
>> >>>> >> >>> >>>>>> a thread for the other discussion per Nicholas'
>> request.
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>> I meant changing public user interfaces.  I think the
>> >>>> >> >>> >>>>>> first
>> >>>> >> >>> >>>>>> design
>> >>>> >> >>> >>>>>> is
>> >>>> >> >>> >>>>>> unlikely to be right, because it's done at a time when
>> you
>> >>>> >> >>> >>>>>> have
>> >>>> >> >>> >>>>>> the
>> >>>> >> >>> >>>>>> least information.  As a user, I find it considerably
>> more
>> >>>> >> >>> >>>>>> frustrating
>> >>>> >> >>> >>>>>> to be unable to use a tool to get my job done, than I
>> do
>> >>>> >> >>> >>>>>> having
>> >>>> >> >>> >>>>>> to
>> >>>> >> >>> >>>>>> make minor changes to my code in order to take
>> advantage
>> >>>> >> >>> >>>>>> of
>> >>>> >> >>> >>>>>> features.
>> >>>> >> >>> >>>>>> I've seen committers be seriously reluctant to allow
>> >>>> >> >>> >>>>>> changes to
>> >>>> >> >>> >>>>>> @experimental code that are needed in order for it to
>> >>>> >> >>> >>>>>> really
>> >>>> >> >>> >>>>>> work
>> >>>> >> >>> >>>>>> right.  You need to be able to iterate, and if people
>> on
>> >>>> >> >>> >>>>>> both
>> >>>> >> >>> >>>>>> sides
>> >>>> >> >>> >>>>>> of
>> >>>> >> >>> >>>>>> the fence aren't going to respect that some newer apis
>> are
>> >>>> >> >>> >>>>>> subject
>> >>>> >> >>> >>>>>> to
>> >>>> >> >>> >>>>>> change, then why even mark them as such?
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>> Ideally a finished SIP should give me a checklist of
>> >>>> >> >>> >>>>>> things
>> >>>> >> >>> >>>>>> that
>> >>>> >> >>> >>>>>> an
>> >>>> >> >>> >>>>>> implementation must do, and things that it doesn't
>> need to
>> >>>> >> >>> >>>>>> do.
>> >>>> >> >>> >>>>>> Contributors/committers should be seriously discouraged
>> >>>> >> >>> >>>>>> from
>> >>>> >> >>> >>>>>> putting
>> >>>> >> >>> >>>>>> out a version 0.1 that doesn't have at least a
>> prototype
>> >>>> >> >>> >>>>>> implementation of all those things, especially if
>> they're
>> >>>> >> >>> >>>>>> then
>> >>>> >> >>> >>>>>> going
>> >>>> >> >>> >>>>>> to argue against interface changes necessary to get the
>> >>>> >> >>> >>>>>> the
>> >>>> >> >>> >>>>>> rest
>> >>>> >> >>> >>>>>> of
>> >>>> >> >>> >>>>>> the things done in the 0.2 version.
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden
>> >>>> >> >>> >>>>>> email]>
>> >>>> >> >>> >>>>>> wrote:
>> >>>> >> >>> >>>>>>> I like the lightweight proposal to add a SIP label.
>> >>>> >> >>> >>>>>>>
>> >>>> >> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I
>> >>>> >> >>> >>>>>>> suggested
>> >>>> >> >>> >>>>>>> using
>> >>>> >> >>> >>>>>>> wiki
>> >>>> >> >>> >>>>>>> to
>> >>>> >> >>> >>>>>>> track the list of major changes, but that never really
>> >>>> >> >>> >>>>>>> materialized
>> >>>> >> >>> >>>>>>> due to
>> >>>> >> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and
>> then
>> >>>> >> >>> >>>>>>> link
>> >>>> >> >>> >>>>>>> to
>> >>>> >> >>> >>>>>>> them
>> >>>> >> >>> >>>>>>> prominently on the Spark website makes a lot of sense.
>> >>>> >> >>> >>>>>>>
>> >>>> >> >>> >>>>>>>
>> >>>> >> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>> >>>> >> >>> >>>>>>> <[hidden email]>
>> >>>> >> >>> >>>>>>> wrote:
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> For the improvement proposals, I think one major
>> point
>> >>>> >> >>> >>>>>>>> was to
>> >>>> >> >>> >>>>>>>> make
>> >>>> >> >>> >>>>>>>> them
>> >>>> >> >>> >>>>>>>> really visible to users who are not contributors, so
>> we
>> >>>> >> >>> >>>>>>>> should
>> >>>> >> >>> >>>>>>>> do
>> >>>> >> >>> >>>>>>>> more than
>> >>>> >> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is
>> to
>> >>>> >> >>> >>>>>>>> have a
>> >>>> >> >>> >>>>>>>> new
>> >>>> >> >>> >>>>>>>> type of
>> >>>> >> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that
>> shows
>> >>>> >> >>> >>>>>>>> all
>> >>>> >> >>> >>>>>>>> such
>> >>>> >> >>> >>>>>>>> JIRAs from
>> >>>> >> >>> >>>>>>>> http://spark.apache.org. I also like the idea of
>> SIP and
>> >>>> >> >>> >>>>>>>> design
>> >>>> >> >>> >>>>>>>> doc
>> >>>> >> >>> >>>>>>>> templates (in fact many projects have them).
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> Matei
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden
>> >>>> >> >>> >>>>>>>> email]>
>> >>>> >> >>> >>>>>>>> wrote:
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> I called Cody last night and talked about some of the
>> >>>> >> >>> >>>>>>>> topics
>> >>>> >> >>> >>>>>>>> in
>> >>>> >> >>> >>>>>>>> his
>> >>>> >> >>> >>>>>>>> email.
>> >>>> >> >>> >>>>>>>> It became clear to me Cody genuinely cares about the
>> >>>> >> >>> >>>>>>>> project.
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> Some of the frustrations come from the success of the
>> >>>> >> >>> >>>>>>>> project
>> >>>> >> >>> >>>>>>>> itself
>> >>>> >> >>> >>>>>>>> becoming very "hot", and it is difficult to get
>> clarity
>> >>>> >> >>> >>>>>>>> from
>> >>>> >> >>> >>>>>>>> people
>> >>>> >> >>> >>>>>>>> who
>> >>>> >> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it
>> is
>> >>>> >> >>> >>>>>>>> in
>> >>>> >> >>> >>>>>>>> some
>> >>>> >> >>> >>>>>>>> ways
>> >>>> >> >>> >>>>>>>> similar
>> >>>> >> >>> >>>>>>>> to scaling an engineering team in a successful
>> startup:
>> >>>> >> >>> >>>>>>>> old
>> >>>> >> >>> >>>>>>>> processes that
>> >>>> >> >>> >>>>>>>> worked well might not work so well when it gets to a
>> >>>> >> >>> >>>>>>>> certain
>> >>>> >> >>> >>>>>>>> size,
>> >>>> >> >>> >>>>>>>> cultures
>> >>>> >> >>> >>>>>>>> can get diluted, building culture vs building
>> process,
>> >>>> >> >>> >>>>>>>> etc.
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> I also really like to have a more visible process for
>> >>>> >> >>> >>>>>>>> larger
>> >>>> >> >>> >>>>>>>> changes,
>> >>>> >> >>> >>>>>>>> especially major user facing API changes.
>> Historically
>> >>>> >> >>> >>>>>>>> we
>> >>>> >> >>> >>>>>>>> upload
>> >>>> >> >>> >>>>>>>> design docs
>> >>>> >> >>> >>>>>>>> for major changes, but it is not always consistent
>> and
>> >>>> >> >>> >>>>>>>> difficult
>> >>>> >> >>> >>>>>>>> to
>> >>>> >> >>> >>>>>>>> quality
>> >>>> >> >>> >>>>>>>> of the docs, due to the volunteering nature of the
>> >>>> >> >>> >>>>>>>> organization.
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on
>> >>>> >> >>> >>>>>>>> building a
>> >>>> >> >>> >>>>>>>> culture
>> >>>> >> >>> >>>>>>>> to improve clarity:
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> - Process: Large changes should have design docs
>> posted
>> >>>> >> >>> >>>>>>>> on
>> >>>> >> >>> >>>>>>>> JIRA.
>> >>>> >> >>> >>>>>>>> One
>> >>>> >> >>> >>>>>>>> thing
>> >>>> >> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came
>> to
>> >>>> >> >>> >>>>>>>> me is
>> >>>> >> >>> >>>>>>>> we
>> >>>> >> >>> >>>>>>>> should
>> >>>> >> >>> >>>>>>>> create a design doc template for the project and ask
>> >>>> >> >>> >>>>>>>> everybody
>> >>>> >> >>> >>>>>>>> to
>> >>>> >> >>> >>>>>>>> follow.
>> >>>> >> >>> >>>>>>>> The design doc template should also explicitly list
>> >>>> >> >>> >>>>>>>> goals and
>> >>>> >> >>> >>>>>>>> non-goals, to
>> >>>> >> >>> >>>>>>>> make design doc more consistent.
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have
>> some
>> >>>> >> >>> >>>>>>>> this
>> >>>> >> >>> >>>>>>>> with
>> >>>> >> >>> >>>>>>>> some
>> >>>> >> >>> >>>>>>>> changes, but again very inconsistent. Just posting
>> >>>> >> >>> >>>>>>>> something
>> >>>> >> >>> >>>>>>>> on
>> >>>> >> >>> >>>>>>>> JIRA
>> >>>> >> >>> >>>>>>>> isn't
>> >>>> >> >>> >>>>>>>> sufficient, because there are simply too many JIRAs
>> and
>> >>>> >> >>> >>>>>>>> the
>> >>>> >> >>> >>>>>>>> signal
>> >>>> >> >>> >>>>>>>> get lost
>> >>>> >> >>> >>>>>>>> in the noise. While this is generally impossible to
>> >>>> >> >>> >>>>>>>> enforce
>> >>>> >> >>> >>>>>>>> because
>> >>>> >> >>> >>>>>>>> we can't
>> >>>> >> >>> >>>>>>>> force all volunteers to conform to a process (or they
>> >>>> >> >>> >>>>>>>> might
>> >>>> >> >>> >>>>>>>> not
>> >>>> >> >>> >>>>>>>> even
>> >>>> >> >>> >>>>>>>> be
>> >>>> >> >>> >>>>>>>> aware of this),  those who are more familiar with the
>> >>>> >> >>> >>>>>>>> project
>> >>>> >> >>> >>>>>>>> can
>> >>>> >> >>> >>>>>>>> help by
>> >>>> >> >>> >>>>>>>> emailing the dev@ when they see something that
>> hasn't
>> >>>> >> >>> >>>>>>>> been.
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> - Culture: The design doc author(s) should be open to
>> >>>> >> >>> >>>>>>>> feedback.
>> >>>> >> >>> >>>>>>>> A
>> >>>> >> >>> >>>>>>>> design
>> >>>> >> >>> >>>>>>>> doc should serve as the base for discussion and is
>> by no
>> >>>> >> >>> >>>>>>>> means
>> >>>> >> >>> >>>>>>>> the
>> >>>> >> >>> >>>>>>>> final
>> >>>> >> >>> >>>>>>>> design. Of course, this does not mean the author has
>> to
>> >>>> >> >>> >>>>>>>> accept
>> >>>> >> >>> >>>>>>>> every
>> >>>> >> >>> >>>>>>>> feedback. They should also be comfortable accepting /
>> >>>> >> >>> >>>>>>>> rejecting
>> >>>> >> >>> >>>>>>>> ideas on
>> >>>> >> >>> >>>>>>>> technical grounds.
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it
>> can
>> >>>> >> >>> >>>>>>>> be
>> >>>> >> >>> >>>>>>>> useful
>> >>>> >> >>> >>>>>>>> to
>> >>>> >> >>> >>>>>>>> have
>> >>>> >> >>> >>>>>>>> some monthly Google hangouts that are open to the
>> world.
>> >>>> >> >>> >>>>>>>> I am
>> >>>> >> >>> >>>>>>>> actually not
>> >>>> >> >>> >>>>>>>> sure how well this will work, because of the
>> >>>> >> >>> >>>>>>>> volunteering
>> >>>> >> >>> >>>>>>>> nature
>> >>>> >> >>> >>>>>>>> and
>> >>>> >> >>> >>>>>>>> we need
>> >>>> >> >>> >>>>>>>> to adjust for timezones for people across the globe,
>> but
>> >>>> >> >>> >>>>>>>> it
>> >>>> >> >>> >>>>>>>> seems
>> >>>> >> >>> >>>>>>>> worth
>> >>>> >> >>> >>>>>>>> trying.
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> - Culture: Contributors (including committers)
>> should be
>> >>>> >> >>> >>>>>>>> more
>> >>>> >> >>> >>>>>>>> direct
>> >>>> >> >>> >>>>>>>> in
>> >>>> >> >>> >>>>>>>> setting expectations, including whether they are
>> working
>> >>>> >> >>> >>>>>>>> on a
>> >>>> >> >>> >>>>>>>> specific
>> >>>> >> >>> >>>>>>>> issue, whether they will be working on a specific
>> issue,
>> >>>> >> >>> >>>>>>>> and
>> >>>> >> >>> >>>>>>>> whether
>> >>>> >> >>> >>>>>>>> an
>> >>>> >> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I
>> >>>> >> >>> >>>>>>>> know in
>> >>>> >> >>> >>>>>>>> this
>> >>>> >> >>> >>>>>>>> community
>> >>>> >> >>> >>>>>>>> are nice and don't enjoy telling other people no,
>> but it
>> >>>> >> >>> >>>>>>>> is
>> >>>> >> >>> >>>>>>>> often
>> >>>> >> >>> >>>>>>>> more
>> >>>> >> >>> >>>>>>>> annoying to a contributor to not know anything than
>> >>>> >> >>> >>>>>>>> getting a
>> >>>> >> >>> >>>>>>>> no.
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>> >>>> >> >>> >>>>>>>> <[hidden email]>
>> >>>> >> >>> >>>>>>>> wrote:
>> >>>> >> >>> >>>>>>>>>
>> >>>> >> >>> >>>>>>>>>
>> >>>> >> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement
>> >>>> >> >>> >>>>>>>>> Proposal"
>> >>>> >> >>> >>>>>>>>> process that
>> >>>> >> >>> >>>>>>>>> solicits user input on new APIs. For what it's
>> worth, I
>> >>>> >> >>> >>>>>>>>> don't
>> >>>> >> >>> >>>>>>>>> think
>> >>>> >> >>> >>>>>>>>> committers are trying to minimize their own work --
>> >>>> >> >>> >>>>>>>>> every
>> >>>> >> >>> >>>>>>>>> committer
>> >>>> >> >>> >>>>>>>>> cares
>> >>>> >> >>> >>>>>>>>> about making the software useful for users.
>> However, it
>> >>>> >> >>> >>>>>>>>> is
>> >>>> >> >>> >>>>>>>>> always
>> >>>> >> >>> >>>>>>>>> hard to
>> >>>> >> >>> >>>>>>>>> get user input and so it helps to have this kind of
>> >>>> >> >>> >>>>>>>>> process.
>> >>>> >> >>> >>>>>>>>> I've
>> >>>> >> >>> >>>>>>>>> certainly
>> >>>> >> >>> >>>>>>>>> looked at the *IPs a lot in other software I use
>> just
>> >>>> >> >>> >>>>>>>>> to see
>> >>>> >> >>> >>>>>>>>> the
>> >>>> >> >>> >>>>>>>>> biggest
>> >>>> >> >>> >>>>>>>>> things on the roadmap.
>> >>>> >> >>> >>>>>>>>>
>> >>>> >> >>> >>>>>>>>> When you're talking about "changing interfaces", are
>> >>>> >> >>> >>>>>>>>> you
>> >>>> >> >>> >>>>>>>>> talking
>> >>>> >> >>> >>>>>>>>> about
>> >>>> >> >>> >>>>>>>>> public or internal APIs? I do think many people hate
>> >>>> >> >>> >>>>>>>>> changing
>> >>>> >> >>> >>>>>>>>> public APIs
>> >>>> >> >>> >>>>>>>>> and I actually think that's for the best of the
>> >>>> >> >>> >>>>>>>>> project.
>> >>>> >> >>> >>>>>>>>> That's
>> >>>> >> >>> >>>>>>>>> a
>> >>>> >> >>> >>>>>>>>> technical
>> >>>> >> >>> >>>>>>>>> debate, but basically, the worst thing when you're
>> >>>> >> >>> >>>>>>>>> using a
>> >>>> >> >>> >>>>>>>>> piece
>> >>>> >> >>> >>>>>>>>> of
>> >>>> >> >>> >>>>>>>>> software
>> >>>> >> >>> >>>>>>>>> is that the developers constantly ask you to rewrite
>> >>>> >> >>> >>>>>>>>> your
>> >>>> >> >>> >>>>>>>>> app
>> >>>> >> >>> >>>>>>>>> to
>> >>>> >> >>> >>>>>>>>> update to a
>> >>>> >> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc).
>> Cue
>> >>>> >> >>> >>>>>>>>> anyone
>> >>>> >> >>> >>>>>>>>> who's used
>> >>>> >> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to
>> change
>> >>>> >> >>> >>>>>>>>> their
>> >>>> >> >>> >>>>>>>>> code
>> >>>> >> >>> >>>>>>>>> this
>> >>>> >> >>> >>>>>>>>> release" model works well within a single large
>> >>>> >> >>> >>>>>>>>> company, but
>> >>>> >> >>> >>>>>>>>> doesn't work
>> >>>> >> >>> >>>>>>>>> well for a community, which is why nearly all *very*
>> >>>> >> >>> >>>>>>>>> widely
>> >>>> >> >>> >>>>>>>>> used
>> >>>> >> >>> >>>>>>>>> programming
>> >>>> >> >>> >>>>>>>>> interfaces (I'm talking things like Java standard
>> >>>> >> >>> >>>>>>>>> library,
>> >>>> >> >>> >>>>>>>>> Windows
>> >>>> >> >>> >>>>>>>>> API, etc)
>> >>>> >> >>> >>>>>>>>> almost *never* break backwards compatibility. All
>> this
>> >>>> >> >>> >>>>>>>>> is
>> >>>> >> >>> >>>>>>>>> done
>> >>>> >> >>> >>>>>>>>> within reason
>> >>>> >> >>> >>>>>>>>> though, e.g. we do change things in major releases
>> >>>> >> >>> >>>>>>>>> (2.x,
>> >>>> >> >>> >>>>>>>>> 3.x,
>> >>>> >> >>> >>>>>>>>> etc).
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>>
>> >>>> >> >>> >>>>>>>
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>> ------------------------------
>> ---------------------------------------
>> >>>> >> >>> >>>>>> To unsubscribe e-mail: [hidden email]
>> >>>> >> >>> >>>>>>
>> >>>> >> >>> >>>>>
>> >>>> >> >>> >>>>>
>> >>>> >> >>> >>>>>
>> >>>> >> >>> >>>>> --
>> >>>> >> >>> >>>>> Stavros Kontopoulos
>> >>>> >> >>> >>>>> Senior Software Engineer
>> >>>> >> >>> >>>>> Lightbend, Inc.
>> >>>> >> >>> >>>>> p:  +30 6977967274
>> >>>> >> >>> >>>>> e: [hidden email]
>> >>>> >> >>> >>>>>
>> >>>> >> >>> >>>>>
>> >>>> >> >>> >>>>
>> >>>> >> >>> >>>
>> >>>> >> >>> >>
>> >>>> >> >>> >>
>> >>>> >> >>>
>> >>>> >> >>
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > ------------------------------------------------------------
>> ---------
>> >>>> >> > To unsubscribe e-mail: [hidden email]
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > ________________________________
>> >>>> >> >
>> >>>> >> > If you reply to this email, your message will be added to the
>> >>>> >> > discussion
>> >>>> >> > below:
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > http://apache-spark-developers-list.1001551.n3.nabble.com/
>> Spark-Improvement-Proposals-tp19268p19359.html
>> >>>> >> >
>> >>>> >> > To start a new topic under Apache Spark Developers List, email
>> >>>> >> > [hidden
>> >>>> >> > email]
>> >>>> >> > To unsubscribe from Apache Spark Developers List, click here.
>> >>>> >> > NAML
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > ________________________________
>> >>>> >> > View this message in context: RE: Spark Improvement Proposals
>> >>>> >> > Sent from the Apache Spark Developers List mailing list archive
>> at
>> >>>> >> > Nabble.com.
>> >>>> >>
>> >>>> >> ------------------------------------------------------------
>> ---------
>> >>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Ryan Blue
>> >>>> > Software Engineer
>> >>>> > Netflix
>> >>>>
>> >>>> ------------------------------------------------------------
>> ---------
>> >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>>>
>> >>>
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Spark Improvement Proposals

Reply via email to