Re: Spark Improvement Proposals

Ryan Blue Mon, 10 Oct 2016 12:04:07 -0700

Sorry, I missed that the proposal includes majority approval. Why majority
instead of consensus? I think we want to build consensus around these
proposals and it makes sense to discuss until no one would veto.


rb

On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue <[email protected]> wrote:

> +1 to votes to approve proposals. I agree that proposals should have an
> official mechanism to be accepted, and a vote is an established means of
> doing that well. I like that it includes a period to review the proposal
> and I think proposals should have been discussed enough ahead of a vote to
> survive the possibility of a veto.
>
> I also like the names that are short and (mostly) unique, like SEP.
>
> Where I disagree is with the requirement that a committer must formally
> propose an enhancement. I don't see the value of restricting this: if
> someone has the will to write up a proposal then they should be encouraged
> to do so and start a discussion about it. Even if there is a political
> reality as Cody says, what is the value of codifying that in our process? I
> think restricting who can submit proposals would only undermine them by
> pushing contributors out. Maybe I'm missing something here?
>
> rb
>
>
>
> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <[email protected]>
> wrote:
>
>> Yes, users suggesting SIPs is a good thing and is explicitly called
>> out in the linked document under the Who? section.  Formally proposing
>> them, not so much, because of the political realities.
>>
>> Yes, implementation strategy definitely affects goals.  There are all
>> kinds of examples of this, I'll pick one that's my fault so as to
>> avoid sounding like I'm blaming:
>>
>> When I implemented the Kafka DStream, one of my (not explicitly agreed
>> upon by the community) goals was to make sure people could use the
>> Dstream with however they were already using Kafka at work.  The lack
>> of explicit agreement on that goal led to all kinds of fighting with
>> committers, that could have been avoided.  The lack of explicit
>> up-front strategy discussion led to the DStream not really working
>> with compacted topics.  I knew about compacted topics, but don't have
>> a use for them, so had a blind spot there.  If there was explicit
>> up-front discussion that my strategy was "assume that batches can be
>> defined on the driver solely by beginning and ending offsets", there's
>> a greater chance that a user would have seen that and said, "hey, what
>> about non-contiguous offsets in a compacted topic".
>>
>> This kind of thing is only going to happen smoothly if we have a
>> lightweight user-visible process with clear outcomes.
>>
>> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
>> <[email protected]> wrote:
>> > I agree with most of what Cody said.
>> >
>> > Two things:
>> >
>> > First we can always have other people suggest SIPs but mark them as
>> > “unreviewed” and have committers basically move them forward. The
>> problem is
>> > that writing a good document takes time. This way we can leverage non
>> > committers to do some of this work (it is just another way to
>> contribute).
>> >
>> >
>> >
>> > As for strategy, in many cases implementation strategy can affect the
>> goals.
>> > I will give  a small example: In the current structured streaming
>> strategy,
>> > we group by the time to achieve a sliding window. This is definitely an
>> > implementation decision and not a goal. However, I can think of several
>> > aggregation functions which have the time inside their calculation
>> buffer.
>> > For example, let’s say we want to return a set of all distinct values.
>> One
>> > way to implement this would be to make the set into a map and have the
>> value
>> > contain the last time seen. Multiplying it across the groupby would
>> cost a
>> > lot in performance. So adding such a strategy would have a great effect
>> on
>> > the type of aggregations and their performance which does affect the
>> goal.
>> > Without adding the strategy, it is easy for whoever goes to the design
>> > document to not think about these cases. Furthermore, it might be
>> decided
>> > that these cases are rare enough so that the strategy is still good
>> enough
>> > but how would we know it without user feedback?
>> >
>> > I believe this example is exactly what Cody was talking about. Since
>> many
>> > times implementation strategies have a large effect on the goal, we
>> should
>> > have it discussed when discussing the goals. In addition, while it is
>> often
>> > easy to throw out completely infeasible goals, it is often much harder
>> to
>> > figure out that the goals are unfeasible without fine tuning.
>> >
>> >
>> >
>> >
>> >
>> > Assaf.
>> >
>> >
>> >
>> > From: Cody Koeninger-2 [via Apache Spark Developers List]
>> > [mailto:ml-node+[hidden email]]
>> > Sent: Monday, October 10, 2016 2:25 AM
>> > To: Mendelson, Assaf
>> > Subject: Re: Spark Improvement Proposals
>> >
>> >
>> >
>> > Only committers should formally submit SIPs because in an apache
>> > project only commiters have explicit political power.  If a user can't
>> > find a commiter willing to sponsor an SIP idea, they have no way to
>> > get the idea passed in any case.  If I can't find a committer to
>> > sponsor this meta-SIP idea, I'm out of luck.
>> >
>> > I do not believe unrealistic goals can be found solely by inspection.
>> > We've managed to ignore unrealistic goals even after implementation!
>> > Focusing on APIs can allow people to think they've solved something,
>> > when there's really no way of implementing that API while meeting the
>> > goals.  Rapid iteration is clearly the best way to address this, but
>> > we've already talked about why that hasn't really worked.  If adding a
>> > non-binding API section to the template is important to you, I'm not
>> > against it, but I don't think it's sufficient.
>> >
>> > On your PRD vs design doc spectrum, I'm saying this is closer to a
>> > PRD.  Clear agreement on goals is the most important thing and that's
>> > why it's the thing I want binding agreement on.  But I cannot agree to
>> > goals unless I have enough minimal technical info to judge whether the
>> > goals are likely to actually be accomplished.
>> >
>> >
>> >
>> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote:
>> >
>> >
>> >> Well, I think there are a few things here that don't make sense. First,
>> >> why
>> >> should only committers submit SIPs? Development in the project should
>> be
>> >> open to all contributors, whether they're committers or not. Second, I
>> >> think
>> >> unrealistic goals can be found just by inspecting the goals, and I'm
>> not
>> >> super worried that we'll accept a lot of SIPs that are then infeasible
>> --
>> >> we
>> >> can then submit new ones. But this depends on whether you want this
>> >> process
>> >> to be a "design doc lite", where people also agree on implementation
>> >> strategy, or just a way to agree on goals. This is what I asked earlier
>> >> about PRDs vs design docs (and I'm open to either one but I'd just like
>> >> clarity). Finally, both as a user and designer of software, I always
>> want
>> >> to
>> >> give feedback on APIs, so I'd really like a culture of having those
>> early.
>> >> People don't argue about prettiness when they discuss APIs, they argue
>> >> about
>> >> the core concepts to expose in order to meet various goals, and then
>> >> they're
>> >> stuck maintaining those for a long time.
>> >>
>> >> Matei
>> >>
>> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>> >>
>> >> Users instead of people, sure.  Commiters and contributors are (or at
>> >> least
>> >> should be) a subset of users.
>> >>
>> >> Non goals, sure. I don't care what the name is, but we need to clearly
>> say
>> >> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>> >>
>> >> API, what I care most about is whether it allows me to accomplish the
>> >> goals.
>> >> Arguing about how ugly or pretty it is can be saved for design/
>> >> implementation imho.
>> >>
>> >> Strategy, this is necessary because otherwise goals can be out of line
>> >> with
>> >> reality.  Don't propose goals you don't have at least some idea of how
>> to
>> >> implement.
>> >>
>> >> Rejected strategies, given that commiters are the only ones I'm saying
>> >> should formally submit SPARKLIs or SIPs, if they put junk in a required
>> >> section then slap them down for it and tell them to fix it.
>> >>
>> >>
>> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>> >>>
>> >>> Yup, this is the stuff that I found unclear. Thanks for clarifying
>> here,
>> >>> but we should also clarify it in the writeup. In particular:
>> >>>
>> >>> - Goals needs to be about user-facing behavior ("people" is broad)
>> >>>
>> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig
>> up
>> >>> one of these and say "Spark's developers have officially rejected X,
>> >>> which
>> >>> our awesome system has".
>> >>>
>> >>> - For user-facing stuff, I think you need a section on API. Virtually
>> all
>> >>> other *IPs I've seen have that.
>> >>>
>> >>> - I'm still not sure why the strategy section is needed if the
>> purpose is
>> >>> to define user-facing behavior -- unless this is the strategy for
>> setting
>> >>> the goals or for defining the API. That sounds squarely like a design
>> doc
>> >>> issue. In some sense, who cares whether the proposal is technically
>> >>> feasible
>> >>> right now? If it's infeasible, that will be discovered later during
>> >>> design
>> >>> and implementation. Same thing with rejected strategies -- listing
>> some
>> >>> of
>> >>> those is definitely useful sometimes, but if you make this a
>> *required*
>> >>> section, people are just going to fill it in with bogus stuff (I've
>> seen
>> >>> this happen before).
>> >>>
>> >>> Matei
>> >>>
>> >
>> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> wrote:
>> >>> >
>> >>> > So to focus the discussion on the specific strategy I'm suggesting,
>> >>> > documented at
>> >>> >
>> >>> >
>> >>> >
>> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-i
>> mprovement-proposals.md
>> >>> >
>> >>> > "Goals: What must this allow people to do, that they can't
>> currently?"
>> >>> >
>> >>> > Is it unclear that this is focusing specifically on people-visible
>> >>> > behavior?
>> >>> >
>> >>> > Rejected goals -  are important because otherwise people keep trying
>> >>> > to argue about scope.  Of course you can change things later with a
>> >>> > different SIP and different vote, the point is to focus.
>> >>> >
>> >>> > Use cases - are something that people are going to bring up in
>> >>> > discussion.  If they aren't clearly documented as a goal ("This must
>> >>> > allow me to connect using SSL"), they should be added.
>> >>> >
>> >>> > Internal architecture - if the people who need specific behavior are
>> >>> > implementers of other parts of the system, that's fine.
>> >>> >
>> >>> > Rejected strategies - If you have none of these, you have no
>> evidence
>> >>> > that the proponent didn't just go with the first thing they had in
>> >>> > mind (or have already implemented), which is a big problem
>> currently.
>> >>> > Approval isn't binding as to specifics of implementation, so these
>> >>> > aren't handcuffs.  The goals are the contract, the strategy is
>> >>> > evidence that contract can actually be met.
>> >>> >
>> >>> > Design docs - I'm not touching design docs.  The markdown file I
>> >>> > linked specifically says of the strategy section "This is not a full
>> >>> > design document."  Is this unclear?  Design docs can be worked on
>> >>> > obviously, but that's not what I'm concerned with here.
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>> >>> > wrote:
>> >>> >> Hi Cody,
>> >>> >>
>> >>> >> I think this would be a lot more concrete if we had a more detailed
>> >>> >> template
>> >>> >> for SIPs. Right now, it's not super clear what's in scope -- e.g.
>> are
>> >>> >> they
>> >>> >> a way to solicit feedback on the user-facing behavior or on the
>> >>> >> internals?
>> >>> >> "Goals" can cover both things. I've been thinking of SIPs more as
>> >>> >> Product
>> >>> >> Requirements Docs (PRDs), which focus on *what* a code change
>> should
>> >>> >> do
>> >>> >> as
>> >>> >> opposed to how.
>> >>> >>
>> >>> >> In particular, here are some things that you may or may not
>> consider
>> >>> >> in
>> >>> >> scope for SIPs:
>> >>> >>
>> >>> >> - Goals and non-goals: This is definitely in scope, and IMO should
>> >>> >> focus on
>> >>> >> user-visible behavior (e.g. "system supports SQL window functions"
>> or
>> >>> >> "system continues working if one node fails"). BTW I wouldn't say
>> >>> >> "rejected
>> >>> >> goals" because some of them might become goals later, so we're not
>> >>> >> definitively rejecting them.
>> >>> >>
>> >>> >> - Public API: Probably should be included in most SIPs unless it's
>> too
>> >>> >> large
>> >>> >> to fully specify then (e.g. "let's add an ML library").
>> >>> >>
>> >>> >> - Use cases: I usually find this very useful in PRDs to better
>> >>> >> communicate
>> >>> >> the goals.
>> >>> >>
>> >>> >> - Internal architecture: This is usually *not* a thing users can
>> >>> >> easily
>> >>> >> comment on and it sounds more like a design doc item. Of course
>> it's
>> >>> >> important to show that the SIP is feasible to implement. One
>> >>> >> exception,
>> >>> >> however, is that I think we'll have some SIPs primarily on
>> internals
>> >>> >> (e.g.
>> >>> >> if somebody wants to refactor Spark's query optimizer or
>> something).
>> >>> >>
>> >>> >> - Rejected strategies: I personally wouldn't put this, because
>> what's
>> >>> >> the
>> >>> >> point of voting to reject a strategy before you've really begun
>> >>> >> designing
>> >>> >> and implementing something? What if you discover that the strategy
>> is
>> >>> >> actually better when you start doing stuff?
>> >>> >>
>> >>> >> At a super high level, it depends on whether you want the SIPs to
>> be
>> >>> >> PRDs
>> >>> >> for getting some quick feedback on the goals of a feature before
>> it is
>> >>> >> designed, or something more like full-fledged design docs (just a
>> more
>> >>> >> visible design doc for bigger changes). I looked at Kafka's KIPs,
>> and
>> >>> >> they
>> >>> >> actually seem to be more like design docs. This can work too but it
>> >>> >> does
>> >>> >> require more work from the proposer and it can lead to the same
>> >>> >> problems you
>> >>> >> mentioned with people already having a design and implementation in
>> >>> >> mind.
>> >>> >>
>> >>> >> Basically, the question is, are you trying to iterate faster on
>> design
>> >>> >> by
>> >>> >> adding a step for user feedback earlier? Or are you just trying to
>> >>> >> make
>> >>> >> design docs for key features more visible (and their approval more
>> >>> >> formal)?
>> >>> >>
>> >>> >> BTW note that in either case, I'd like to have a template for
>> design
>> >>> >> docs
>> >>> >> too, which should also include goals. I think that would've avoided
>> >>> >> some of
>> >>> >> the issues you brought up.
>> >>> >>
>> >>> >> Matei
>> >>> >>
>> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> wrote:
>> >>> >>
>> >>> >> Here's my specific proposal (meta-proposal?)
>> >>> >>
>> >>> >> Spark Improvement Proposals (SIP)
>> >>> >>
>> >>> >>
>> >>> >> Background:
>> >>> >>
>> >>> >> The current problem is that design and implementation of large
>> >>> >> features
>> >>> >> are
>> >>> >> often done in private, before soliciting user feedback.
>> >>> >>
>> >>> >> When feedback is solicited, it is often as to detailed design
>> >>> >> specifics, not
>> >>> >> focused on goals.
>> >>> >>
>> >>> >> When implementation does take place after design, there is often
>> >>> >> disagreement as to what goals are or are not in scope.
>> >>> >>
>> >>> >> This results in commits that don't fully meet user needs.
>> >>> >>
>> >>> >>
>> >>> >> Goals:
>> >>> >>
>> >>> >> - Ensure user, contributor, and committer goals are clearly
>> identified
>> >>> >> and
>> >>> >> agreed upon, before implementation takes place.
>> >>> >>
>> >>> >> - Ensure that a technically feasible strategy is chosen that is
>> likely
>> >>> >> to
>> >>> >> meet the goals.
>> >>> >>
>> >>> >>
>> >>> >> Rejected Goals:
>> >>> >>
>> >>> >> - SIPs are not for detailed design.  Design by committee doesn't
>> work.
>> >>> >>
>> >>> >> - SIPs are not for every change.  We dont need that much process.
>> >>> >>
>> >>> >>
>> >>> >> Strategy:
>> >>> >>
>> >>> >> My suggestion is outlined as a Spark Improvement Proposal process
>> >>> >> documented
>> >>> >> at
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-i
>> mprovement-proposals.md
>> >>> >>
>> >>> >> Specifics of Jira manipulation are an implementation detail we can
>> >>> >> figure
>> >>> >> out.
>> >>> >>
>> >>> >> I'm suggesting voting; the need here is for a _clear_ outcome.
>> >>> >>
>> >>> >>
>> >>> >> Rejected Strategies:
>> >>> >>
>> >>> >> Having someone who understands the problem implement it first
>> works,
>> >>> >> but
>> >>> >> only if significant iteration after user feedback is allowed.
>> >>> >>
>> >>> >> Historically this has been problematic due to pressure to limit
>> public
>> >>> >> api
>> >>> >> changes.
>> >>> >>
>> >>> >>
>> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Alright looks like there are quite a bit of support. We should
>> wait
>> >>> >>> to
>> >>> >>> hear from more people too.
>> >>> >>>
>> >>> >>> To push this forward, Cody and I will be working together in the
>> next
>> >>> >>> couple of weeks to come up with a concrete, detailed proposal on
>> what
>> >>> >>> this
>> >>> >>> entails, and then we can discuss this the specific proposal as
>> well.
>> >>> >>>
>> >>> >>>
>> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]>
>> >>> >>> wrote:
>> >>> >>>>
>> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>> >>> >>>> user-facing or cross-cutting changes, not minor feature adds.
>> >>> >>>>
>> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>> >>> >>>> <[hidden email]> wrote:
>> >>> >>>>>
>> >>> >>>>> +1 to the SIP label as long as it does not slow down things and
>> it
>> >>> >>>>> targets optimizing efforts, coordination etc. For example really
>> >>> >>>>> small
>> >>> >>>>> features should not need to go through this process (assuming
>> they
>> >>> >>>>> dont
>> >>> >>>>> touch public interfaces)  or re-factorings and hope it will be
>> kept
>> >>> >>>>> this
>> >>> >>>>> way. So as a guideline doc should be provided, like in the KIP
>> >>> >>>>> case.
>> >>> >>>>>
>> >>> >>>>> IMHO so far aside from tagging things and linking them elsewhere
>> >>> >>>>> simply
>> >>> >>>>> having design docs and prototypes implementations in PRs is not
>> >>> >>>>> something
>> >>> >>>>> that has not worked so far. What is really a pain in many
>> projects
>> >>> >>>>> out there
>> >>> >>>>> is discontinuity in progress of PRs, missing features, slow
>> reviews
>> >>> >>>>> which is
>> >>> >>>>> understandable to some extent... it is not only about Spark but
>> >>> >>>>> things can
>> >>> >>>>> be improved for sure for this project in particular as already
>> >>> >>>>> stated.
>> >>> >>>>>
>> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden email]>
>> >>> >>>>> wrote:
>> >>> >>>>>>
>> >>> >>>>>> +1 to adding an SIP label and linking it from the website.  I
>> >>> >>>>>> think
>> >>> >>>>>> it
>> >>> >>>>>> needs
>> >>> >>>>>>
>> >>> >>>>>> - template that focuses it towards soliciting user goals / non
>> >>> >>>>>> goals
>> >>> >>>>>> - clear resolution as to which strategy was chosen to pursue.
>> I'd
>> >>> >>>>>> recommend a vote.
>> >>> >>>>>>
>> >>> >>>>>> Matei asked me to clarify what I meant by changing interfaces,
>> I
>> >>> >>>>>> think
>> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here,
>> and
>> >>> >>>>>> split
>> >>> >>>>>> a thread for the other discussion per Nicholas' request.
>> >>> >>>>>>
>> >>> >>>>>> I meant changing public user interfaces.  I think the first
>> design
>> >>> >>>>>> is
>> >>> >>>>>> unlikely to be right, because it's done at a time when you have
>> >>> >>>>>> the
>> >>> >>>>>> least information.  As a user, I find it considerably more
>> >>> >>>>>> frustrating
>> >>> >>>>>> to be unable to use a tool to get my job done, than I do
>> having to
>> >>> >>>>>> make minor changes to my code in order to take advantage of
>> >>> >>>>>> features.
>> >>> >>>>>> I've seen committers be seriously reluctant to allow changes to
>> >>> >>>>>> @experimental code that are needed in order for it to really
>> work
>> >>> >>>>>> right.  You need to be able to iterate, and if people on both
>> >>> >>>>>> sides
>> >>> >>>>>> of
>> >>> >>>>>> the fence aren't going to respect that some newer apis are
>> subject
>> >>> >>>>>> to
>> >>> >>>>>> change, then why even mark them as such?
>> >>> >>>>>>
>> >>> >>>>>> Ideally a finished SIP should give me a checklist of things
>> that
>> >>> >>>>>> an
>> >>> >>>>>> implementation must do, and things that it doesn't need to do.
>> >>> >>>>>> Contributors/committers should be seriously discouraged from
>> >>> >>>>>> putting
>> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype
>> >>> >>>>>> implementation of all those things, especially if they're then
>> >>> >>>>>> going
>> >>> >>>>>> to argue against interface changes necessary to get the the
>> rest
>> >>> >>>>>> of
>> >>> >>>>>> the things done in the 0.2 version.
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]>
>> >>> >>>>>> wrote:
>> >>> >>>>>>> I like the lightweight proposal to add a SIP label.
>> >>> >>>>>>>
>> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested
>> using
>> >>> >>>>>>> wiki
>> >>> >>>>>>> to
>> >>> >>>>>>> track the list of major changes, but that never really
>> >>> >>>>>>> materialized
>> >>> >>>>>>> due to
>> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then link
>> to
>> >>> >>>>>>> them
>> >>> >>>>>>> prominently on the Spark website makes a lot of sense.
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>> >>> >>>>>>> <[hidden email]>
>> >>> >>>>>>> wrote:
>> >>> >>>>>>>>
>> >>> >>>>>>>> For the improvement proposals, I think one major point was to
>> >>> >>>>>>>> make
>> >>> >>>>>>>> them
>> >>> >>>>>>>> really visible to users who are not contributors, so we
>> should
>> >>> >>>>>>>> do
>> >>> >>>>>>>> more than
>> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to have
>> a
>> >>> >>>>>>>> new
>> >>> >>>>>>>> type of
>> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows all
>> >>> >>>>>>>> such
>> >>> >>>>>>>> JIRAs from
>> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and
>> design
>> >>> >>>>>>>> doc
>> >>> >>>>>>>> templates (in fact many projects have them).
>> >>> >>>>>>>>
>> >>> >>>>>>>> Matei
>> >>> >>>>>>>>
>> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>> >>> >>>>>>>> wrote:
>> >>> >>>>>>>>
>> >>> >>>>>>>> I called Cody last night and talked about some of the topics
>> in
>> >>> >>>>>>>> his
>> >>> >>>>>>>> email.
>> >>> >>>>>>>> It became clear to me Cody genuinely cares about the project.
>> >>> >>>>>>>>
>> >>> >>>>>>>> Some of the frustrations come from the success of the project
>> >>> >>>>>>>> itself
>> >>> >>>>>>>> becoming very "hot", and it is difficult to get clarity from
>> >>> >>>>>>>> people
>> >>> >>>>>>>> who
>> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in
>> some
>> >>> >>>>>>>> ways
>> >>> >>>>>>>> similar
>> >>> >>>>>>>> to scaling an engineering team in a successful startup: old
>> >>> >>>>>>>> processes that
>> >>> >>>>>>>> worked well might not work so well when it gets to a certain
>> >>> >>>>>>>> size,
>> >>> >>>>>>>> cultures
>> >>> >>>>>>>> can get diluted, building culture vs building process, etc.
>> >>> >>>>>>>>
>> >>> >>>>>>>> I also really like to have a more visible process for larger
>> >>> >>>>>>>> changes,
>> >>> >>>>>>>> especially major user facing API changes. Historically we
>> upload
>> >>> >>>>>>>> design docs
>> >>> >>>>>>>> for major changes, but it is not always consistent and
>> difficult
>> >>> >>>>>>>> to
>> >>> >>>>>>>> quality
>> >>> >>>>>>>> of the docs, due to the volunteering nature of the
>> organization.
>> >>> >>>>>>>>
>> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on
>> building a
>> >>> >>>>>>>> culture
>> >>> >>>>>>>> to improve clarity:
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Process: Large changes should have design docs posted on
>> JIRA.
>> >>> >>>>>>>> One
>> >>> >>>>>>>> thing
>> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me
>> is we
>> >>> >>>>>>>> should
>> >>> >>>>>>>> create a design doc template for the project and ask
>> everybody
>> >>> >>>>>>>> to
>> >>> >>>>>>>> follow.
>> >>> >>>>>>>> The design doc template should also explicitly list goals and
>> >>> >>>>>>>> non-goals, to
>> >>> >>>>>>>> make design doc more consistent.
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some this
>> >>> >>>>>>>> with
>> >>> >>>>>>>> some
>> >>> >>>>>>>> changes, but again very inconsistent. Just posting something
>> on
>> >>> >>>>>>>> JIRA
>> >>> >>>>>>>> isn't
>> >>> >>>>>>>> sufficient, because there are simply too many JIRAs and the
>> >>> >>>>>>>> signal
>> >>> >>>>>>>> get lost
>> >>> >>>>>>>> in the noise. While this is generally impossible to enforce
>> >>> >>>>>>>> because
>> >>> >>>>>>>> we can't
>> >>> >>>>>>>> force all volunteers to conform to a process (or they might
>> not
>> >>> >>>>>>>> even
>> >>> >>>>>>>> be
>> >>> >>>>>>>> aware of this),  those who are more familiar with the project
>> >>> >>>>>>>> can
>> >>> >>>>>>>> help by
>> >>> >>>>>>>> emailing the dev@ when they see something that hasn't been.
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Culture: The design doc author(s) should be open to
>> feedback.
>> >>> >>>>>>>> A
>> >>> >>>>>>>> design
>> >>> >>>>>>>> doc should serve as the base for discussion and is by no
>> means
>> >>> >>>>>>>> the
>> >>> >>>>>>>> final
>> >>> >>>>>>>> design. Of course, this does not mean the author has to
>> accept
>> >>> >>>>>>>> every
>> >>> >>>>>>>> feedback. They should also be comfortable accepting /
>> rejecting
>> >>> >>>>>>>> ideas on
>> >>> >>>>>>>> technical grounds.
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be
>> >>> >>>>>>>> useful
>> >>> >>>>>>>> to
>> >>> >>>>>>>> have
>> >>> >>>>>>>> some monthly Google hangouts that are open to the world. I am
>> >>> >>>>>>>> actually not
>> >>> >>>>>>>> sure how well this will work, because of the volunteering
>> nature
>> >>> >>>>>>>> and
>> >>> >>>>>>>> we need
>> >>> >>>>>>>> to adjust for timezones for people across the globe, but it
>> >>> >>>>>>>> seems
>> >>> >>>>>>>> worth
>> >>> >>>>>>>> trying.
>> >>> >>>>>>>>
>> >>> >>>>>>>> - Culture: Contributors (including committers) should be more
>> >>> >>>>>>>> direct
>> >>> >>>>>>>> in
>> >>> >>>>>>>> setting expectations, including whether they are working on a
>> >>> >>>>>>>> specific
>> >>> >>>>>>>> issue, whether they will be working on a specific issue, and
>> >>> >>>>>>>> whether
>> >>> >>>>>>>> an
>> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I know in
>> >>> >>>>>>>> this
>> >>> >>>>>>>> community
>> >>> >>>>>>>> are nice and don't enjoy telling other people no, but it is
>> >>> >>>>>>>> often
>> >>> >>>>>>>> more
>> >>> >>>>>>>> annoying to a contributor to not know anything than getting a
>> >>> >>>>>>>> no.
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>> >>> >>>>>>>> <[hidden email]>
>> >>> >>>>>>>> wrote:
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement Proposal"
>> >>> >>>>>>>>> process that
>> >>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I
>> don't
>> >>> >>>>>>>>> think
>> >>> >>>>>>>>> committers are trying to minimize their own work -- every
>> >>> >>>>>>>>> committer
>> >>> >>>>>>>>> cares
>> >>> >>>>>>>>> about making the software useful for users. However, it is
>> >>> >>>>>>>>> always
>> >>> >>>>>>>>> hard to
>> >>> >>>>>>>>> get user input and so it helps to have this kind of process.
>> >>> >>>>>>>>> I've
>> >>> >>>>>>>>> certainly
>> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just to see
>> >>> >>>>>>>>> the
>> >>> >>>>>>>>> biggest
>> >>> >>>>>>>>> things on the roadmap.
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> When you're talking about "changing interfaces", are you
>> >>> >>>>>>>>> talking
>> >>> >>>>>>>>> about
>> >>> >>>>>>>>> public or internal APIs? I do think many people hate
>> changing
>> >>> >>>>>>>>> public APIs
>> >>> >>>>>>>>> and I actually think that's for the best of the project.
>> That's
>> >>> >>>>>>>>> a
>> >>> >>>>>>>>> technical
>> >>> >>>>>>>>> debate, but basically, the worst thing when you're using a
>> >>> >>>>>>>>> piece
>> >>> >>>>>>>>> of
>> >>> >>>>>>>>> software
>> >>> >>>>>>>>> is that the developers constantly ask you to rewrite your
>> app
>> >>> >>>>>>>>> to
>> >>> >>>>>>>>> update to a
>> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue
>> anyone
>> >>> >>>>>>>>> who's used
>> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change their
>> >>> >>>>>>>>> code
>> >>> >>>>>>>>> this
>> >>> >>>>>>>>> release" model works well within a single large company, but
>> >>> >>>>>>>>> doesn't work
>> >>> >>>>>>>>> well for a community, which is why nearly all *very* widely
>> >>> >>>>>>>>> used
>> >>> >>>>>>>>> programming
>> >>> >>>>>>>>> interfaces (I'm talking things like Java standard library,
>> >>> >>>>>>>>> Windows
>> >>> >>>>>>>>> API, etc)
>> >>> >>>>>>>>> almost *never* break backwards compatibility. All this is
>> done
>> >>> >>>>>>>>> within reason
>> >>> >>>>>>>>> though, e.g. we do change things in major releases (2.x,
>> 3.x,
>> >>> >>>>>>>>> etc).
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> ------------------------------------------------------------
>> ---------
>> >>> >>>>>> To unsubscribe e-mail: [hidden email]
>> >>> >>>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> --
>> >>> >>>>> Stavros Kontopoulos
>> >>> >>>>> Senior Software Engineer
>> >>> >>>>> Lightbend, Inc.
>> >>> >>>>> p:  +30 6977967274
>> >>> >>>>> e: [hidden email]
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>>
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: [hidden email]
>> >
>> >
>> > ________________________________
>> >
>> > If you reply to this email, your message will be added to the discussion
>> > below:
>> >
>> > http://apache-spark-developers-list.1001551.n3.nabble.com/
>> Spark-Improvement-Proposals-tp19268p19359.html
>> >
>> > To start a new topic under Apache Spark Developers List, email [hidden
>> > email]
>> > To unsubscribe from Apache Spark Developers List, click here.
>> > NAML
>> >
>> >
>> > ________________________________
>> > View this message in context: RE: Spark Improvement Proposals
>> > Sent from the Apache Spark Developers List mailing list archive at
>> > Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>



-- 
Ryan Blue
Software Engineer
Netflix

Re: Spark Improvement Proposals

Reply via email to