Re: Spark Improvement Proposals

Cody Koeninger Mon, 10 Oct 2016 12:44:42 -0700

Updated on github,
https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md


I believe I've touched on all feedback with the exception of naming,
and API vs Strategy.

Do we want a straw poll on naming?

Matei, are your concerns about api vs strategy addressed if we add an
API bullet point to the template?

On Mon, Oct 10, 2016 at 2:38 PM, Steve Loughran <ste...@hortonworks.com> wrote:
> This is an interesting process proposal; I think it could work well.
>
> -It's got the flavour of the ASF incubator; maybe some of the processes 
> there: mentor, regular reporting in could help, in particular, help stop the 
> -1 at the end of the work
> -it may also aid collaboration to have a medium lived branch, so enabling 
> collaboration with multiple people submitting PRs into the ASF codebase. This 
> can reduce cost of merge and enable jenkins to keep on top of it. It also 
> fits in well with the ASF "do in apache infra" community development process.
>
>
>> On 10 Oct 2016, at 20:26, Matei Zaharia <matei.zaha...@gmail.com> wrote:
>>
>> Agreed with this. As I said before regarding who submits: it's not a normal 
>> ASF process to require contributions to only come from committers. 
>> Committers are of course the only people who can *commit* stuff. But the 
>> whole point of an open source project is that anyone can *contribute* -- 
>> indeed, that is how people become committers. For example, in every ASF 
>> project, anyone can open JIRAs, submit design docs, submit patches, review 
>> patches, and vote on releases. This particular process is very similar to 
>> posting a JIRA or a design doc.
>>
>> I also like consensus with a deadline (e.g. someone says "here is a new SEP, 
>> we want to accept it by date X so please comment before").
>>
>> In general, with this type of stuff, it's better to start with very 
>> lightweight processes and then expand them if needed. Adding lots of rules 
>> from the beginning makes it confusing and can reduce contributions. 
>> Although, as engineers, we believe that anything can be solved using 
>> mechanical rules, in practice software development is a social process that 
>> ultimately requires humans to tackle things on a case-by-case basis.
>>
>> Matei
>>
>>
>>> On Oct 10, 2016, at 12:19 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>>
>>> That seems reasonable to me.
>>>
>>> I do not want to see lazy consensus used on one of these proposals
>>> though, I want a clear outcome, i.e. call for a vote, wait at least 72
>>> hours, get three +1s and no vetos.
>>>
>>>
>>>
>>> On Mon, Oct 10, 2016 at 2:15 PM, Ryan Blue <rb...@netflix.com> wrote:
>>>> Proposal submission: I think we should keep this as open as possible. If
>>>> there is a problem with too many open proposals, then we should tackle that
>>>> as a fix rather than excluding participation. Perhaps it will end up that
>>>> way, but I think it's worth trying a more open model first.
>>>>
>>>> Majority vs consensus: My rationale is that I don't think we want to
>>>> consider a proposal approved if it had objections serious enough that
>>>> committers down-voted (or PMC depending on who gets a vote). If these
>>>> proposals are like PEPs, then they represent a significant amount of
>>>> community effort and I wouldn't want to move forward if up to half of the
>>>> community thinks it's an untenable idea.
>>>>
>>>> rb
>>>>
>>>> On Mon, Oct 10, 2016 at 12:07 PM, Cody Koeninger <c...@koeninger.org> 
>>>> wrote:
>>>>>
>>>>> I think this is closer to a procedural issue than a code modification
>>>>> issue, hence why majority.  If everyone thinks consensus is better, I
>>>>> don't care.  Again, I don't feel strongly about the way we achieve
>>>>> clarity, just that we achieve clarity.
>>>>>
>>>>> On Mon, Oct 10, 2016 at 2:02 PM, Ryan Blue <rb...@netflix.com> wrote:
>>>>>> Sorry, I missed that the proposal includes majority approval. Why
>>>>>> majority
>>>>>> instead of consensus? I think we want to build consensus around these
>>>>>> proposals and it makes sense to discuss until no one would veto.
>>>>>>
>>>>>> rb
>>>>>>
>>>>>> On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue <rb...@netflix.com> wrote:
>>>>>>>
>>>>>>> +1 to votes to approve proposals. I agree that proposals should have an
>>>>>>> official mechanism to be accepted, and a vote is an established means
>>>>>>> of
>>>>>>> doing that well. I like that it includes a period to review the
>>>>>>> proposal and
>>>>>>> I think proposals should have been discussed enough ahead of a vote to
>>>>>>> survive the possibility of a veto.
>>>>>>>
>>>>>>> I also like the names that are short and (mostly) unique, like SEP.
>>>>>>>
>>>>>>> Where I disagree is with the requirement that a committer must formally
>>>>>>> propose an enhancement. I don't see the value of restricting this: if
>>>>>>> someone has the will to write up a proposal then they should be
>>>>>>> encouraged
>>>>>>> to do so and start a discussion about it. Even if there is a political
>>>>>>> reality as Cody says, what is the value of codifying that in our
>>>>>>> process? I
>>>>>>> think restricting who can submit proposals would only undermine them by
>>>>>>> pushing contributors out. Maybe I'm missing something here?
>>>>>>>
>>>>>>> rb
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Yes, users suggesting SIPs is a good thing and is explicitly called
>>>>>>>> out in the linked document under the Who? section.  Formally proposing
>>>>>>>> them, not so much, because of the political realities.
>>>>>>>>
>>>>>>>> Yes, implementation strategy definitely affects goals.  There are all
>>>>>>>> kinds of examples of this, I'll pick one that's my fault so as to
>>>>>>>> avoid sounding like I'm blaming:
>>>>>>>>
>>>>>>>> When I implemented the Kafka DStream, one of my (not explicitly agreed
>>>>>>>> upon by the community) goals was to make sure people could use the
>>>>>>>> Dstream with however they were already using Kafka at work.  The lack
>>>>>>>> of explicit agreement on that goal led to all kinds of fighting with
>>>>>>>> committers, that could have been avoided.  The lack of explicit
>>>>>>>> up-front strategy discussion led to the DStream not really working
>>>>>>>> with compacted topics.  I knew about compacted topics, but don't have
>>>>>>>> a use for them, so had a blind spot there.  If there was explicit
>>>>>>>> up-front discussion that my strategy was "assume that batches can be
>>>>>>>> defined on the driver solely by beginning and ending offsets", there's
>>>>>>>> a greater chance that a user would have seen that and said, "hey, what
>>>>>>>> about non-contiguous offsets in a compacted topic".
>>>>>>>>
>>>>>>>> This kind of thing is only going to happen smoothly if we have a
>>>>>>>> lightweight user-visible process with clear outcomes.
>>>>>>>>
>>>>>>>> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson
>>>>>>>> <assaf.mendel...@rsa.com> wrote:
>>>>>>>>> I agree with most of what Cody said.
>>>>>>>>>
>>>>>>>>> Two things:
>>>>>>>>>
>>>>>>>>> First we can always have other people suggest SIPs but mark them as
>>>>>>>>> “unreviewed” and have committers basically move them forward. The
>>>>>>>>> problem is
>>>>>>>>> that writing a good document takes time. This way we can leverage
>>>>>>>>> non
>>>>>>>>> committers to do some of this work (it is just another way to
>>>>>>>>> contribute).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As for strategy, in many cases implementation strategy can affect
>>>>>>>>> the
>>>>>>>>> goals.
>>>>>>>>> I will give  a small example: In the current structured streaming
>>>>>>>>> strategy,
>>>>>>>>> we group by the time to achieve a sliding window. This is definitely
>>>>>>>>> an
>>>>>>>>> implementation decision and not a goal. However, I can think of
>>>>>>>>> several
>>>>>>>>> aggregation functions which have the time inside their calculation
>>>>>>>>> buffer.
>>>>>>>>> For example, let’s say we want to return a set of all distinct
>>>>>>>>> values.
>>>>>>>>> One
>>>>>>>>> way to implement this would be to make the set into a map and have
>>>>>>>>> the
>>>>>>>>> value
>>>>>>>>> contain the last time seen. Multiplying it across the groupby would
>>>>>>>>> cost a
>>>>>>>>> lot in performance. So adding such a strategy would have a great
>>>>>>>>> effect
>>>>>>>>> on
>>>>>>>>> the type of aggregations and their performance which does affect the
>>>>>>>>> goal.
>>>>>>>>> Without adding the strategy, it is easy for whoever goes to the
>>>>>>>>> design
>>>>>>>>> document to not think about these cases. Furthermore, it might be
>>>>>>>>> decided
>>>>>>>>> that these cases are rare enough so that the strategy is still good
>>>>>>>>> enough
>>>>>>>>> but how would we know it without user feedback?
>>>>>>>>>
>>>>>>>>> I believe this example is exactly what Cody was talking about. Since
>>>>>>>>> many
>>>>>>>>> times implementation strategies have a large effect on the goal, we
>>>>>>>>> should
>>>>>>>>> have it discussed when discussing the goals. In addition, while it
>>>>>>>>> is
>>>>>>>>> often
>>>>>>>>> easy to throw out completely infeasible goals, it is often much
>>>>>>>>> harder
>>>>>>>>> to
>>>>>>>>> figure out that the goals are unfeasible without fine tuning.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Assaf.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Cody Koeninger-2 [via Apache Spark Developers List]
>>>>>>>>> [mailto:ml-node+[hidden email]]
>>>>>>>>> Sent: Monday, October 10, 2016 2:25 AM
>>>>>>>>> To: Mendelson, Assaf
>>>>>>>>> Subject: Re: Spark Improvement Proposals
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Only committers should formally submit SIPs because in an apache
>>>>>>>>> project only commiters have explicit political power.  If a user
>>>>>>>>> can't
>>>>>>>>> find a commiter willing to sponsor an SIP idea, they have no way to
>>>>>>>>> get the idea passed in any case.  If I can't find a committer to
>>>>>>>>> sponsor this meta-SIP idea, I'm out of luck.
>>>>>>>>>
>>>>>>>>> I do not believe unrealistic goals can be found solely by
>>>>>>>>> inspection.
>>>>>>>>> We've managed to ignore unrealistic goals even after implementation!
>>>>>>>>> Focusing on APIs can allow people to think they've solved something,
>>>>>>>>> when there's really no way of implementing that API while meeting
>>>>>>>>> the
>>>>>>>>> goals.  Rapid iteration is clearly the best way to address this, but
>>>>>>>>> we've already talked about why that hasn't really worked.  If adding
>>>>>>>>> a
>>>>>>>>> non-binding API section to the template is important to you, I'm not
>>>>>>>>> against it, but I don't think it's sufficient.
>>>>>>>>>
>>>>>>>>> On your PRD vs design doc spectrum, I'm saying this is closer to a
>>>>>>>>> PRD.  Clear agreement on goals is the most important thing and
>>>>>>>>> that's
>>>>>>>>> why it's the thing I want binding agreement on.  But I cannot agree
>>>>>>>>> to
>>>>>>>>> goals unless I have enough minimal technical info to judge whether
>>>>>>>>> the
>>>>>>>>> goals are likely to actually be accomplished.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Well, I think there are a few things here that don't make sense.
>>>>>>>>>> First,
>>>>>>>>>> why
>>>>>>>>>> should only committers submit SIPs? Development in the project
>>>>>>>>>> should
>>>>>>>>>> be
>>>>>>>>>> open to all contributors, whether they're committers or not.
>>>>>>>>>> Second, I
>>>>>>>>>> think
>>>>>>>>>> unrealistic goals can be found just by inspecting the goals, and
>>>>>>>>>> I'm
>>>>>>>>>> not
>>>>>>>>>> super worried that we'll accept a lot of SIPs that are then
>>>>>>>>>> infeasible
>>>>>>>>>> --
>>>>>>>>>> we
>>>>>>>>>> can then submit new ones. But this depends on whether you want this
>>>>>>>>>> process
>>>>>>>>>> to be a "design doc lite", where people also agree on
>>>>>>>>>> implementation
>>>>>>>>>> strategy, or just a way to agree on goals. This is what I asked
>>>>>>>>>> earlier
>>>>>>>>>> about PRDs vs design docs (and I'm open to either one but I'd just
>>>>>>>>>> like
>>>>>>>>>> clarity). Finally, both as a user and designer of software, I
>>>>>>>>>> always
>>>>>>>>>> want
>>>>>>>>>> to
>>>>>>>>>> give feedback on APIs, so I'd really like a culture of having those
>>>>>>>>>> early.
>>>>>>>>>> People don't argue about prettiness when they discuss APIs, they
>>>>>>>>>> argue
>>>>>>>>>> about
>>>>>>>>>> the core concepts to expose in order to meet various goals, and
>>>>>>>>>> then
>>>>>>>>>> they're
>>>>>>>>>> stuck maintaining those for a long time.
>>>>>>>>>>
>>>>>>>>>> Matei
>>>>>>>>>>
>>>>>>>>>> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> Users instead of people, sure.  Commiters and contributors are (or
>>>>>>>>>> at
>>>>>>>>>> least
>>>>>>>>>> should be) a subset of users.
>>>>>>>>>>
>>>>>>>>>> Non goals, sure. I don't care what the name is, but we need to
>>>>>>>>>> clearly
>>>>>>>>>> say
>>>>>>>>>> e.g. 'no we are not maintaining compatibility with XYZ right now'.
>>>>>>>>>>
>>>>>>>>>> API, what I care most about is whether it allows me to accomplish
>>>>>>>>>> the
>>>>>>>>>> goals.
>>>>>>>>>> Arguing about how ugly or pretty it is can be saved for design/
>>>>>>>>>> implementation imho.
>>>>>>>>>>
>>>>>>>>>> Strategy, this is necessary because otherwise goals can be out of
>>>>>>>>>> line
>>>>>>>>>> with
>>>>>>>>>> reality.  Don't propose goals you don't have at least some idea of
>>>>>>>>>> how
>>>>>>>>>> to
>>>>>>>>>> implement.
>>>>>>>>>>
>>>>>>>>>> Rejected strategies, given that commiters are the only ones I'm
>>>>>>>>>> saying
>>>>>>>>>> should formally submit SPARKLIs or SIPs, if they put junk in a
>>>>>>>>>> required
>>>>>>>>>> section then slap them down for it and tell them to fix it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Yup, this is the stuff that I found unclear. Thanks for clarifying
>>>>>>>>>>> here,
>>>>>>>>>>> but we should also clarify it in the writeup. In particular:
>>>>>>>>>>>
>>>>>>>>>>> - Goals needs to be about user-facing behavior ("people" is broad)
>>>>>>>>>>>
>>>>>>>>>>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will
>>>>>>>>>>> dig
>>>>>>>>>>> up
>>>>>>>>>>> one of these and say "Spark's developers have officially rejected
>>>>>>>>>>> X,
>>>>>>>>>>> which
>>>>>>>>>>> our awesome system has".
>>>>>>>>>>>
>>>>>>>>>>> - For user-facing stuff, I think you need a section on API.
>>>>>>>>>>> Virtually
>>>>>>>>>>> all
>>>>>>>>>>> other *IPs I've seen have that.
>>>>>>>>>>>
>>>>>>>>>>> - I'm still not sure why the strategy section is needed if the
>>>>>>>>>>> purpose is
>>>>>>>>>>> to define user-facing behavior -- unless this is the strategy for
>>>>>>>>>>> setting
>>>>>>>>>>> the goals or for defining the API. That sounds squarely like a
>>>>>>>>>>> design
>>>>>>>>>>> doc
>>>>>>>>>>> issue. In some sense, who cares whether the proposal is
>>>>>>>>>>> technically
>>>>>>>>>>> feasible
>>>>>>>>>>> right now? If it's infeasible, that will be discovered later
>>>>>>>>>>> during
>>>>>>>>>>> design
>>>>>>>>>>> and implementation. Same thing with rejected strategies -- listing
>>>>>>>>>>> some
>>>>>>>>>>> of
>>>>>>>>>>> those is definitely useful sometimes, but if you make this a
>>>>>>>>>>> *required*
>>>>>>>>>>> section, people are just going to fill it in with bogus stuff
>>>>>>>>>>> (I've
>>>>>>>>>>> seen
>>>>>>>>>>> this happen before).
>>>>>>>>>>>
>>>>>>>>>>> Matei
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> So to focus the discussion on the specific strategy I'm
>>>>>>>>>>>> suggesting,
>>>>>>>>>>>> documented at
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>>>>>>>>>>>
>>>>>>>>>>>> "Goals: What must this allow people to do, that they can't
>>>>>>>>>>>> currently?"
>>>>>>>>>>>>
>>>>>>>>>>>> Is it unclear that this is focusing specifically on
>>>>>>>>>>>> people-visible
>>>>>>>>>>>> behavior?
>>>>>>>>>>>>
>>>>>>>>>>>> Rejected goals -  are important because otherwise people keep
>>>>>>>>>>>> trying
>>>>>>>>>>>> to argue about scope.  Of course you can change things later
>>>>>>>>>>>> with a
>>>>>>>>>>>> different SIP and different vote, the point is to focus.
>>>>>>>>>>>>
>>>>>>>>>>>> Use cases - are something that people are going to bring up in
>>>>>>>>>>>> discussion.  If they aren't clearly documented as a goal ("This
>>>>>>>>>>>> must
>>>>>>>>>>>> allow me to connect using SSL"), they should be added.
>>>>>>>>>>>>
>>>>>>>>>>>> Internal architecture - if the people who need specific behavior
>>>>>>>>>>>> are
>>>>>>>>>>>> implementers of other parts of the system, that's fine.
>>>>>>>>>>>>
>>>>>>>>>>>> Rejected strategies - If you have none of these, you have no
>>>>>>>>>>>> evidence
>>>>>>>>>>>> that the proponent didn't just go with the first thing they had
>>>>>>>>>>>> in
>>>>>>>>>>>> mind (or have already implemented), which is a big problem
>>>>>>>>>>>> currently.
>>>>>>>>>>>> Approval isn't binding as to specifics of implementation, so
>>>>>>>>>>>> these
>>>>>>>>>>>> aren't handcuffs.  The goals are the contract, the strategy is
>>>>>>>>>>>> evidence that contract can actually be met.
>>>>>>>>>>>>
>>>>>>>>>>>> Design docs - I'm not touching design docs.  The markdown file I
>>>>>>>>>>>> linked specifically says of the strategy section "This is not a
>>>>>>>>>>>> full
>>>>>>>>>>>> design document."  Is this unclear?  Design docs can be worked
>>>>>>>>>>>> on
>>>>>>>>>>>> obviously, but that's not what I'm concerned with here.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Cody,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this would be a lot more concrete if we had a more
>>>>>>>>>>>>> detailed
>>>>>>>>>>>>> template
>>>>>>>>>>>>> for SIPs. Right now, it's not super clear what's in scope --
>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>> are
>>>>>>>>>>>>> they
>>>>>>>>>>>>> a way to solicit feedback on the user-facing behavior or on the
>>>>>>>>>>>>> internals?
>>>>>>>>>>>>> "Goals" can cover both things. I've been thinking of SIPs more
>>>>>>>>>>>>> as
>>>>>>>>>>>>> Product
>>>>>>>>>>>>> Requirements Docs (PRDs), which focus on *what* a code change
>>>>>>>>>>>>> should
>>>>>>>>>>>>> do
>>>>>>>>>>>>> as
>>>>>>>>>>>>> opposed to how.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In particular, here are some things that you may or may not
>>>>>>>>>>>>> consider
>>>>>>>>>>>>> in
>>>>>>>>>>>>> scope for SIPs:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Goals and non-goals: This is definitely in scope, and IMO
>>>>>>>>>>>>> should
>>>>>>>>>>>>> focus on
>>>>>>>>>>>>> user-visible behavior (e.g. "system supports SQL window
>>>>>>>>>>>>> functions"
>>>>>>>>>>>>> or
>>>>>>>>>>>>> "system continues working if one node fails"). BTW I wouldn't
>>>>>>>>>>>>> say
>>>>>>>>>>>>> "rejected
>>>>>>>>>>>>> goals" because some of them might become goals later, so we're
>>>>>>>>>>>>> not
>>>>>>>>>>>>> definitively rejecting them.
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Public API: Probably should be included in most SIPs unless
>>>>>>>>>>>>> it's
>>>>>>>>>>>>> too
>>>>>>>>>>>>> large
>>>>>>>>>>>>> to fully specify then (e.g. "let's add an ML library").
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Use cases: I usually find this very useful in PRDs to better
>>>>>>>>>>>>> communicate
>>>>>>>>>>>>> the goals.
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Internal architecture: This is usually *not* a thing users
>>>>>>>>>>>>> can
>>>>>>>>>>>>> easily
>>>>>>>>>>>>> comment on and it sounds more like a design doc item. Of course
>>>>>>>>>>>>> it's
>>>>>>>>>>>>> important to show that the SIP is feasible to implement. One
>>>>>>>>>>>>> exception,
>>>>>>>>>>>>> however, is that I think we'll have some SIPs primarily on
>>>>>>>>>>>>> internals
>>>>>>>>>>>>> (e.g.
>>>>>>>>>>>>> if somebody wants to refactor Spark's query optimizer or
>>>>>>>>>>>>> something).
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Rejected strategies: I personally wouldn't put this, because
>>>>>>>>>>>>> what's
>>>>>>>>>>>>> the
>>>>>>>>>>>>> point of voting to reject a strategy before you've really begun
>>>>>>>>>>>>> designing
>>>>>>>>>>>>> and implementing something? What if you discover that the
>>>>>>>>>>>>> strategy
>>>>>>>>>>>>> is
>>>>>>>>>>>>> actually better when you start doing stuff?
>>>>>>>>>>>>>
>>>>>>>>>>>>> At a super high level, it depends on whether you want the SIPs
>>>>>>>>>>>>> to
>>>>>>>>>>>>> be
>>>>>>>>>>>>> PRDs
>>>>>>>>>>>>> for getting some quick feedback on the goals of a feature
>>>>>>>>>>>>> before
>>>>>>>>>>>>> it is
>>>>>>>>>>>>> designed, or something more like full-fledged design docs (just
>>>>>>>>>>>>> a
>>>>>>>>>>>>> more
>>>>>>>>>>>>> visible design doc for bigger changes). I looked at Kafka's
>>>>>>>>>>>>> KIPs,
>>>>>>>>>>>>> and
>>>>>>>>>>>>> they
>>>>>>>>>>>>> actually seem to be more like design docs. This can work too
>>>>>>>>>>>>> but
>>>>>>>>>>>>> it
>>>>>>>>>>>>> does
>>>>>>>>>>>>> require more work from the proposer and it can lead to the same
>>>>>>>>>>>>> problems you
>>>>>>>>>>>>> mentioned with people already having a design and
>>>>>>>>>>>>> implementation
>>>>>>>>>>>>> in
>>>>>>>>>>>>> mind.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Basically, the question is, are you trying to iterate faster on
>>>>>>>>>>>>> design
>>>>>>>>>>>>> by
>>>>>>>>>>>>> adding a step for user feedback earlier? Or are you just trying
>>>>>>>>>>>>> to
>>>>>>>>>>>>> make
>>>>>>>>>>>>> design docs for key features more visible (and their approval
>>>>>>>>>>>>> more
>>>>>>>>>>>>> formal)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> BTW note that in either case, I'd like to have a template for
>>>>>>>>>>>>> design
>>>>>>>>>>>>> docs
>>>>>>>>>>>>> too, which should also include goals. I think that would've
>>>>>>>>>>>>> avoided
>>>>>>>>>>>>> some of
>>>>>>>>>>>>> the issues you brought up.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Matei
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's my specific proposal (meta-proposal?)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Spark Improvement Proposals (SIP)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Background:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The current problem is that design and implementation of large
>>>>>>>>>>>>> features
>>>>>>>>>>>>> are
>>>>>>>>>>>>> often done in private, before soliciting user feedback.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When feedback is solicited, it is often as to detailed design
>>>>>>>>>>>>> specifics, not
>>>>>>>>>>>>> focused on goals.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When implementation does take place after design, there is
>>>>>>>>>>>>> often
>>>>>>>>>>>>> disagreement as to what goals are or are not in scope.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This results in commits that don't fully meet user needs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Goals:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Ensure user, contributor, and committer goals are clearly
>>>>>>>>>>>>> identified
>>>>>>>>>>>>> and
>>>>>>>>>>>>> agreed upon, before implementation takes place.
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Ensure that a technically feasible strategy is chosen that is
>>>>>>>>>>>>> likely
>>>>>>>>>>>>> to
>>>>>>>>>>>>> meet the goals.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rejected Goals:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - SIPs are not for detailed design.  Design by committee
>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>> work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> - SIPs are not for every change.  We dont need that much
>>>>>>>>>>>>> process.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Strategy:
>>>>>>>>>>>>>
>>>>>>>>>>>>> My suggestion is outlined as a Spark Improvement Proposal
>>>>>>>>>>>>> process
>>>>>>>>>>>>> documented
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>>>>>>>>>>>>
>>>>>>>>>>>>> Specifics of Jira manipulation are an implementation detail we
>>>>>>>>>>>>> can
>>>>>>>>>>>>> figure
>>>>>>>>>>>>> out.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm suggesting voting; the need here is for a _clear_ outcome.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rejected Strategies:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Having someone who understands the problem implement it first
>>>>>>>>>>>>> works,
>>>>>>>>>>>>> but
>>>>>>>>>>>>> only if significant iteration after user feedback is allowed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Historically this has been problematic due to pressure to limit
>>>>>>>>>>>>> public
>>>>>>>>>>>>> api
>>>>>>>>>>>>> changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alright looks like there are quite a bit of support. We should
>>>>>>>>>>>>>> wait
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> hear from more people too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To push this forward, Cody and I will be working together in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> next
>>>>>>>>>>>>>> couple of weeks to come up with a concrete, detailed proposal
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> what
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> entails, and then we can discuss this the specific proposal as
>>>>>>>>>>>>>> well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden
>>>>>>>>>>>>>> email]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yeah, in case it wasn't clear, I was talking about SIPs for
>>>>>>>>>>>>>>> major
>>>>>>>>>>>>>>> user-facing or cross-cutting changes, not minor feature adds.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
>>>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +1 to the SIP label as long as it does not slow down things
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> targets optimizing efforts, coordination etc. For example
>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>> small
>>>>>>>>>>>>>>>> features should not need to go through this process
>>>>>>>>>>>>>>>> (assuming
>>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>> dont
>>>>>>>>>>>>>>>> touch public interfaces)  or re-factorings and hope it will
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> kept
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> way. So as a guideline doc should be provided, like in the
>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> IMHO so far aside from tagging things and linking them
>>>>>>>>>>>>>>>> elsewhere
>>>>>>>>>>>>>>>> simply
>>>>>>>>>>>>>>>> having design docs and prototypes implementations in PRs is
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>> that has not worked so far. What is really a pain in many
>>>>>>>>>>>>>>>> projects
>>>>>>>>>>>>>>>> out there
>>>>>>>>>>>>>>>> is discontinuity in progress of PRs, missing features, slow
>>>>>>>>>>>>>>>> reviews
>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>> understandable to some extent... it is not only about Spark
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> things can
>>>>>>>>>>>>>>>> be improved for sure for this project in particular as
>>>>>>>>>>>>>>>> already
>>>>>>>>>>>>>>>> stated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden
>>>>>>>>>>>>>>>> email]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 to adding an SIP label and linking it from the website.
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - template that focuses it towards soliciting user goals /
>>>>>>>>>>>>>>>>> non
>>>>>>>>>>>>>>>>> goals
>>>>>>>>>>>>>>>>> - clear resolution as to which strategy was chosen to
>>>>>>>>>>>>>>>>> pursue.
>>>>>>>>>>>>>>>>> I'd
>>>>>>>>>>>>>>>>> recommend a vote.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Matei asked me to clarify what I meant by changing
>>>>>>>>>>>>>>>>> interfaces,
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>> it's directly relevant to the SIP idea so I'll clarify
>>>>>>>>>>>>>>>>> here,
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> split
>>>>>>>>>>>>>>>>> a thread for the other discussion per Nicholas' request.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I meant changing public user interfaces.  I think the first
>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> unlikely to be right, because it's done at a time when you
>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> least information.  As a user, I find it considerably more
>>>>>>>>>>>>>>>>> frustrating
>>>>>>>>>>>>>>>>> to be unable to use a tool to get my job done, than I do
>>>>>>>>>>>>>>>>> having to
>>>>>>>>>>>>>>>>> make minor changes to my code in order to take advantage of
>>>>>>>>>>>>>>>>> features.
>>>>>>>>>>>>>>>>> I've seen committers be seriously reluctant to allow
>>>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> @experimental code that are needed in order for it to
>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>> right.  You need to be able to iterate, and if people on
>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>> sides
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> the fence aren't going to respect that some newer apis are
>>>>>>>>>>>>>>>>> subject
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> change, then why even mark them as such?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ideally a finished SIP should give me a checklist of things
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> implementation must do, and things that it doesn't need to
>>>>>>>>>>>>>>>>> do.
>>>>>>>>>>>>>>>>> Contributors/committers should be seriously discouraged
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> putting
>>>>>>>>>>>>>>>>> out a version 0.1 that doesn't have at least a prototype
>>>>>>>>>>>>>>>>> implementation of all those things, especially if they're
>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>> going
>>>>>>>>>>>>>>>>> to argue against interface changes necessary to get the the
>>>>>>>>>>>>>>>>> rest
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> the things done in the 0.2 version.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden
>>>>>>>>>>>>>>>>> email]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> I like the lightweight proposal to add a SIP label.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested
>>>>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>> wiki
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> track the list of major changes, but that never really
>>>>>>>>>>>>>>>>>> materialized
>>>>>>>>>>>>>>>>>> due to
>>>>>>>>>>>>>>>>>> the overhead. Adding a SIP label on major JIRAs and then
>>>>>>>>>>>>>>>>>> link
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>> prominently on the Spark website makes a lot of sense.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
>>>>>>>>>>>>>>>>>> <[hidden email]>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For the improvement proposals, I think one major point
>>>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>>> really visible to users who are not contributors, so we
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>> more than
>>>>>>>>>>>>>>>>>>> sending stuff to dev@. One very lightweight idea is to
>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>> type of
>>>>>>>>>>>>>>>>>>> JIRA called a SIP and have a link to a filter that shows
>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>> JIRAs from
>>>>>>>>>>>>>>>>>>> http://spark.apache.org. I also like the idea of SIP and
>>>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>> doc
>>>>>>>>>>>>>>>>>>> templates (in fact many projects have them).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Matei
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I called Cody last night and talked about some of the
>>>>>>>>>>>>>>>>>>> topics
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> his
>>>>>>>>>>>>>>>>>>> email.
>>>>>>>>>>>>>>>>>>> It became clear to me Cody genuinely cares about the
>>>>>>>>>>>>>>>>>>> project.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Some of the frustrations come from the success of the
>>>>>>>>>>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>> itself
>>>>>>>>>>>>>>>>>>> becoming very "hot", and it is difficult to get clarity
>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> people
>>>>>>>>>>>>>>>>>>> who
>>>>>>>>>>>>>>>>>>> don't dedicate all their time to Spark. In fact, it is in
>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> ways
>>>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>>>> to scaling an engineering team in a successful startup:
>>>>>>>>>>>>>>>>>>> old
>>>>>>>>>>>>>>>>>>> processes that
>>>>>>>>>>>>>>>>>>> worked well might not work so well when it gets to a
>>>>>>>>>>>>>>>>>>> certain
>>>>>>>>>>>>>>>>>>> size,
>>>>>>>>>>>>>>>>>>> cultures
>>>>>>>>>>>>>>>>>>> can get diluted, building culture vs building process,
>>>>>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also really like to have a more visible process for
>>>>>>>>>>>>>>>>>>> larger
>>>>>>>>>>>>>>>>>>> changes,
>>>>>>>>>>>>>>>>>>> especially major user facing API changes. Historically we
>>>>>>>>>>>>>>>>>>> upload
>>>>>>>>>>>>>>>>>>> design docs
>>>>>>>>>>>>>>>>>>> for major changes, but it is not always consistent and
>>>>>>>>>>>>>>>>>>> difficult
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> quality
>>>>>>>>>>>>>>>>>>> of the docs, due to the volunteering nature of the
>>>>>>>>>>>>>>>>>>> organization.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Some of the more concrete ideas we discussed focus on
>>>>>>>>>>>>>>>>>>> building a
>>>>>>>>>>>>>>>>>>> culture
>>>>>>>>>>>>>>>>>>> to improve clarity:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Process: Large changes should have design docs posted
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> JIRA.
>>>>>>>>>>>>>>>>>>> One
>>>>>>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>>>>>> Cody and I didn't discuss but an idea that just came to
>>>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>> is we
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> create a design doc template for the project and ask
>>>>>>>>>>>>>>>>>>> everybody
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> follow.
>>>>>>>>>>>>>>>>>>> The design doc template should also explicitly list goals
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> non-goals, to
>>>>>>>>>>>>>>>>>>> make design doc more consistent.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Process: Email dev@ to solicit feedback. We have some
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> changes, but again very inconsistent. Just posting
>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> JIRA
>>>>>>>>>>>>>>>>>>> isn't
>>>>>>>>>>>>>>>>>>> sufficient, because there are simply too many JIRAs and
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> signal
>>>>>>>>>>>>>>>>>>> get lost
>>>>>>>>>>>>>>>>>>> in the noise. While this is generally impossible to
>>>>>>>>>>>>>>>>>>> enforce
>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>> we can't
>>>>>>>>>>>>>>>>>>> force all volunteers to conform to a process (or they
>>>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> aware of this),  those who are more familiar with the
>>>>>>>>>>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> help by
>>>>>>>>>>>>>>>>>>> emailing the dev@ when they see something that hasn't
>>>>>>>>>>>>>>>>>>> been.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Culture: The design doc author(s) should be open to
>>>>>>>>>>>>>>>>>>> feedback.
>>>>>>>>>>>>>>>>>>> A
>>>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>> doc should serve as the base for discussion and is by no
>>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> final
>>>>>>>>>>>>>>>>>>> design. Of course, this does not mean the author has to
>>>>>>>>>>>>>>>>>>> accept
>>>>>>>>>>>>>>>>>>> every
>>>>>>>>>>>>>>>>>>> feedback. They should also be comfortable accepting /
>>>>>>>>>>>>>>>>>>> rejecting
>>>>>>>>>>>>>>>>>>> ideas on
>>>>>>>>>>>>>>>>>>> technical grounds.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Process / Culture: For major ongoing projects, it can
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> some monthly Google hangouts that are open to the world.
>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>> actually not
>>>>>>>>>>>>>>>>>>> sure how well this will work, because of the volunteering
>>>>>>>>>>>>>>>>>>> nature
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> we need
>>>>>>>>>>>>>>>>>>> to adjust for timezones for people across the globe, but
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>>> worth
>>>>>>>>>>>>>>>>>>> trying.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Culture: Contributors (including committers) should be
>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>> direct
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> setting expectations, including whether they are working
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>> issue, whether they will be working on a specific issue,
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>> issue or pr or jira should be rejected. Most people I
>>>>>>>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> community
>>>>>>>>>>>>>>>>>>> are nice and don't enjoy telling other people no, but it
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> often
>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>> annoying to a contributor to not know anything than
>>>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> no.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
>>>>>>>>>>>>>>>>>>> <[hidden email]>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Love the idea of a more visible "Spark Improvement
>>>>>>>>>>>>>>>>>>>> Proposal"
>>>>>>>>>>>>>>>>>>>> process that
>>>>>>>>>>>>>>>>>>>> solicits user input on new APIs. For what it's worth, I
>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>> committers are trying to minimize their own work --
>>>>>>>>>>>>>>>>>>>> every
>>>>>>>>>>>>>>>>>>>> committer
>>>>>>>>>>>>>>>>>>>> cares
>>>>>>>>>>>>>>>>>>>> about making the software useful for users. However, it
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>> hard to
>>>>>>>>>>>>>>>>>>>> get user input and so it helps to have this kind of
>>>>>>>>>>>>>>>>>>>> process.
>>>>>>>>>>>>>>>>>>>> I've
>>>>>>>>>>>>>>>>>>>> certainly
>>>>>>>>>>>>>>>>>>>> looked at the *IPs a lot in other software I use just to
>>>>>>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> biggest
>>>>>>>>>>>>>>>>>>>> things on the roadmap.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> When you're talking about "changing interfaces", are you
>>>>>>>>>>>>>>>>>>>> talking
>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>> public or internal APIs? I do think many people hate
>>>>>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>>>> public APIs
>>>>>>>>>>>>>>>>>>>> and I actually think that's for the best of the project.
>>>>>>>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> technical
>>>>>>>>>>>>>>>>>>>> debate, but basically, the worst thing when you're using
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> piece
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> software
>>>>>>>>>>>>>>>>>>>> is that the developers constantly ask you to rewrite
>>>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>> app
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> update to a
>>>>>>>>>>>>>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue
>>>>>>>>>>>>>>>>>>>> anyone
>>>>>>>>>>>>>>>>>>>> who's used
>>>>>>>>>>>>>>>>>>>> Protobuf, or Guava. The "let's get everyone to change
>>>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> release" model works well within a single large company,
>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>> doesn't work
>>>>>>>>>>>>>>>>>>>> well for a community, which is why nearly all *very*
>>>>>>>>>>>>>>>>>>>> widely
>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>> programming
>>>>>>>>>>>>>>>>>>>> interfaces (I'm talking things like Java standard
>>>>>>>>>>>>>>>>>>>> library,
>>>>>>>>>>>>>>>>>>>> Windows
>>>>>>>>>>>>>>>>>>>> API, etc)
>>>>>>>>>>>>>>>>>>>> almost *never* break backwards compatibility. All this
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> done
>>>>>>>>>>>>>>>>>>>> within reason
>>>>>>>>>>>>>>>>>>>> though, e.g. we do change things in major releases (2.x,
>>>>>>>>>>>>>>>>>>>> 3.x,
>>>>>>>>>>>>>>>>>>>> etc).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>> To unsubscribe e-mail: [hidden email]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Stavros Kontopoulos
>>>>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>>>> Lightbend, Inc.
>>>>>>>>>>>>>>>> p:  +30 6977967274
>>>>>>>>>>>>>>>> e: [hidden email]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe e-mail: [hidden email]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>>
>>>>>>>>> If you reply to this email, your message will be added to the
>>>>>>>>> discussion
>>>>>>>>> below:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html
>>>>>>>>>
>>>>>>>>> To start a new topic under Apache Spark Developers List, email
>>>>>>>>> [hidden
>>>>>>>>> email]
>>>>>>>>> To unsubscribe from Apache Spark Developers List, click here.
>>>>>>>>> NAML
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>> View this message in context: RE: Spark Improvement Proposals
>>>>>>>>> Sent from the Apache Spark Developers List mailing list archive at
>>>>>>>>> Nabble.com.
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Software Engineer
>>>>>>> Netflix
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark Improvement Proposals

Reply via email to