Updated on github, https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md
I believe I've touched on all feedback with the exception of naming, and API vs Strategy. Do we want a straw poll on naming? Matei, are your concerns about api vs strategy addressed if we add an API bullet point to the template? On Mon, Oct 10, 2016 at 2:38 PM, Steve Loughran <ste...@hortonworks.com> wrote: > This is an interesting process proposal; I think it could work well. > > -It's got the flavour of the ASF incubator; maybe some of the processes > there: mentor, regular reporting in could help, in particular, help stop the > -1 at the end of the work > -it may also aid collaboration to have a medium lived branch, so enabling > collaboration with multiple people submitting PRs into the ASF codebase. This > can reduce cost of merge and enable jenkins to keep on top of it. It also > fits in well with the ASF "do in apache infra" community development process. > > >> On 10 Oct 2016, at 20:26, Matei Zaharia <matei.zaha...@gmail.com> wrote: >> >> Agreed with this. As I said before regarding who submits: it's not a normal >> ASF process to require contributions to only come from committers. >> Committers are of course the only people who can *commit* stuff. But the >> whole point of an open source project is that anyone can *contribute* -- >> indeed, that is how people become committers. For example, in every ASF >> project, anyone can open JIRAs, submit design docs, submit patches, review >> patches, and vote on releases. This particular process is very similar to >> posting a JIRA or a design doc. >> >> I also like consensus with a deadline (e.g. someone says "here is a new SEP, >> we want to accept it by date X so please comment before"). >> >> In general, with this type of stuff, it's better to start with very >> lightweight processes and then expand them if needed. Adding lots of rules >> from the beginning makes it confusing and can reduce contributions. >> Although, as engineers, we believe that anything can be solved using >> mechanical rules, in practice software development is a social process that >> ultimately requires humans to tackle things on a case-by-case basis. >> >> Matei >> >> >>> On Oct 10, 2016, at 12:19 PM, Cody Koeninger <c...@koeninger.org> wrote: >>> >>> That seems reasonable to me. >>> >>> I do not want to see lazy consensus used on one of these proposals >>> though, I want a clear outcome, i.e. call for a vote, wait at least 72 >>> hours, get three +1s and no vetos. >>> >>> >>> >>> On Mon, Oct 10, 2016 at 2:15 PM, Ryan Blue <rb...@netflix.com> wrote: >>>> Proposal submission: I think we should keep this as open as possible. If >>>> there is a problem with too many open proposals, then we should tackle that >>>> as a fix rather than excluding participation. Perhaps it will end up that >>>> way, but I think it's worth trying a more open model first. >>>> >>>> Majority vs consensus: My rationale is that I don't think we want to >>>> consider a proposal approved if it had objections serious enough that >>>> committers down-voted (or PMC depending on who gets a vote). If these >>>> proposals are like PEPs, then they represent a significant amount of >>>> community effort and I wouldn't want to move forward if up to half of the >>>> community thinks it's an untenable idea. >>>> >>>> rb >>>> >>>> On Mon, Oct 10, 2016 at 12:07 PM, Cody Koeninger <c...@koeninger.org> >>>> wrote: >>>>> >>>>> I think this is closer to a procedural issue than a code modification >>>>> issue, hence why majority. If everyone thinks consensus is better, I >>>>> don't care. Again, I don't feel strongly about the way we achieve >>>>> clarity, just that we achieve clarity. >>>>> >>>>> On Mon, Oct 10, 2016 at 2:02 PM, Ryan Blue <rb...@netflix.com> wrote: >>>>>> Sorry, I missed that the proposal includes majority approval. Why >>>>>> majority >>>>>> instead of consensus? I think we want to build consensus around these >>>>>> proposals and it makes sense to discuss until no one would veto. >>>>>> >>>>>> rb >>>>>> >>>>>> On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue <rb...@netflix.com> wrote: >>>>>>> >>>>>>> +1 to votes to approve proposals. I agree that proposals should have an >>>>>>> official mechanism to be accepted, and a vote is an established means >>>>>>> of >>>>>>> doing that well. I like that it includes a period to review the >>>>>>> proposal and >>>>>>> I think proposals should have been discussed enough ahead of a vote to >>>>>>> survive the possibility of a veto. >>>>>>> >>>>>>> I also like the names that are short and (mostly) unique, like SEP. >>>>>>> >>>>>>> Where I disagree is with the requirement that a committer must formally >>>>>>> propose an enhancement. I don't see the value of restricting this: if >>>>>>> someone has the will to write up a proposal then they should be >>>>>>> encouraged >>>>>>> to do so and start a discussion about it. Even if there is a political >>>>>>> reality as Cody says, what is the value of codifying that in our >>>>>>> process? I >>>>>>> think restricting who can submit proposals would only undermine them by >>>>>>> pushing contributors out. Maybe I'm missing something here? >>>>>>> >>>>>>> rb >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org> >>>>>>> wrote: >>>>>>>> >>>>>>>> Yes, users suggesting SIPs is a good thing and is explicitly called >>>>>>>> out in the linked document under the Who? section. Formally proposing >>>>>>>> them, not so much, because of the political realities. >>>>>>>> >>>>>>>> Yes, implementation strategy definitely affects goals. There are all >>>>>>>> kinds of examples of this, I'll pick one that's my fault so as to >>>>>>>> avoid sounding like I'm blaming: >>>>>>>> >>>>>>>> When I implemented the Kafka DStream, one of my (not explicitly agreed >>>>>>>> upon by the community) goals was to make sure people could use the >>>>>>>> Dstream with however they were already using Kafka at work. The lack >>>>>>>> of explicit agreement on that goal led to all kinds of fighting with >>>>>>>> committers, that could have been avoided. The lack of explicit >>>>>>>> up-front strategy discussion led to the DStream not really working >>>>>>>> with compacted topics. I knew about compacted topics, but don't have >>>>>>>> a use for them, so had a blind spot there. If there was explicit >>>>>>>> up-front discussion that my strategy was "assume that batches can be >>>>>>>> defined on the driver solely by beginning and ending offsets", there's >>>>>>>> a greater chance that a user would have seen that and said, "hey, what >>>>>>>> about non-contiguous offsets in a compacted topic". >>>>>>>> >>>>>>>> This kind of thing is only going to happen smoothly if we have a >>>>>>>> lightweight user-visible process with clear outcomes. >>>>>>>> >>>>>>>> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson >>>>>>>> <assaf.mendel...@rsa.com> wrote: >>>>>>>>> I agree with most of what Cody said. >>>>>>>>> >>>>>>>>> Two things: >>>>>>>>> >>>>>>>>> First we can always have other people suggest SIPs but mark them as >>>>>>>>> “unreviewed” and have committers basically move them forward. The >>>>>>>>> problem is >>>>>>>>> that writing a good document takes time. This way we can leverage >>>>>>>>> non >>>>>>>>> committers to do some of this work (it is just another way to >>>>>>>>> contribute). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> As for strategy, in many cases implementation strategy can affect >>>>>>>>> the >>>>>>>>> goals. >>>>>>>>> I will give a small example: In the current structured streaming >>>>>>>>> strategy, >>>>>>>>> we group by the time to achieve a sliding window. This is definitely >>>>>>>>> an >>>>>>>>> implementation decision and not a goal. However, I can think of >>>>>>>>> several >>>>>>>>> aggregation functions which have the time inside their calculation >>>>>>>>> buffer. >>>>>>>>> For example, let’s say we want to return a set of all distinct >>>>>>>>> values. >>>>>>>>> One >>>>>>>>> way to implement this would be to make the set into a map and have >>>>>>>>> the >>>>>>>>> value >>>>>>>>> contain the last time seen. Multiplying it across the groupby would >>>>>>>>> cost a >>>>>>>>> lot in performance. So adding such a strategy would have a great >>>>>>>>> effect >>>>>>>>> on >>>>>>>>> the type of aggregations and their performance which does affect the >>>>>>>>> goal. >>>>>>>>> Without adding the strategy, it is easy for whoever goes to the >>>>>>>>> design >>>>>>>>> document to not think about these cases. Furthermore, it might be >>>>>>>>> decided >>>>>>>>> that these cases are rare enough so that the strategy is still good >>>>>>>>> enough >>>>>>>>> but how would we know it without user feedback? >>>>>>>>> >>>>>>>>> I believe this example is exactly what Cody was talking about. Since >>>>>>>>> many >>>>>>>>> times implementation strategies have a large effect on the goal, we >>>>>>>>> should >>>>>>>>> have it discussed when discussing the goals. In addition, while it >>>>>>>>> is >>>>>>>>> often >>>>>>>>> easy to throw out completely infeasible goals, it is often much >>>>>>>>> harder >>>>>>>>> to >>>>>>>>> figure out that the goals are unfeasible without fine tuning. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Assaf. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> From: Cody Koeninger-2 [via Apache Spark Developers List] >>>>>>>>> [mailto:ml-node+[hidden email]] >>>>>>>>> Sent: Monday, October 10, 2016 2:25 AM >>>>>>>>> To: Mendelson, Assaf >>>>>>>>> Subject: Re: Spark Improvement Proposals >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Only committers should formally submit SIPs because in an apache >>>>>>>>> project only commiters have explicit political power. If a user >>>>>>>>> can't >>>>>>>>> find a commiter willing to sponsor an SIP idea, they have no way to >>>>>>>>> get the idea passed in any case. If I can't find a committer to >>>>>>>>> sponsor this meta-SIP idea, I'm out of luck. >>>>>>>>> >>>>>>>>> I do not believe unrealistic goals can be found solely by >>>>>>>>> inspection. >>>>>>>>> We've managed to ignore unrealistic goals even after implementation! >>>>>>>>> Focusing on APIs can allow people to think they've solved something, >>>>>>>>> when there's really no way of implementing that API while meeting >>>>>>>>> the >>>>>>>>> goals. Rapid iteration is clearly the best way to address this, but >>>>>>>>> we've already talked about why that hasn't really worked. If adding >>>>>>>>> a >>>>>>>>> non-binding API section to the template is important to you, I'm not >>>>>>>>> against it, but I don't think it's sufficient. >>>>>>>>> >>>>>>>>> On your PRD vs design doc spectrum, I'm saying this is closer to a >>>>>>>>> PRD. Clear agreement on goals is the most important thing and >>>>>>>>> that's >>>>>>>>> why it's the thing I want binding agreement on. But I cannot agree >>>>>>>>> to >>>>>>>>> goals unless I have enough minimal technical info to judge whether >>>>>>>>> the >>>>>>>>> goals are likely to actually be accomplished. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Well, I think there are a few things here that don't make sense. >>>>>>>>>> First, >>>>>>>>>> why >>>>>>>>>> should only committers submit SIPs? Development in the project >>>>>>>>>> should >>>>>>>>>> be >>>>>>>>>> open to all contributors, whether they're committers or not. >>>>>>>>>> Second, I >>>>>>>>>> think >>>>>>>>>> unrealistic goals can be found just by inspecting the goals, and >>>>>>>>>> I'm >>>>>>>>>> not >>>>>>>>>> super worried that we'll accept a lot of SIPs that are then >>>>>>>>>> infeasible >>>>>>>>>> -- >>>>>>>>>> we >>>>>>>>>> can then submit new ones. But this depends on whether you want this >>>>>>>>>> process >>>>>>>>>> to be a "design doc lite", where people also agree on >>>>>>>>>> implementation >>>>>>>>>> strategy, or just a way to agree on goals. This is what I asked >>>>>>>>>> earlier >>>>>>>>>> about PRDs vs design docs (and I'm open to either one but I'd just >>>>>>>>>> like >>>>>>>>>> clarity). Finally, both as a user and designer of software, I >>>>>>>>>> always >>>>>>>>>> want >>>>>>>>>> to >>>>>>>>>> give feedback on APIs, so I'd really like a culture of having those >>>>>>>>>> early. >>>>>>>>>> People don't argue about prettiness when they discuss APIs, they >>>>>>>>>> argue >>>>>>>>>> about >>>>>>>>>> the core concepts to expose in order to meet various goals, and >>>>>>>>>> then >>>>>>>>>> they're >>>>>>>>>> stuck maintaining those for a long time. >>>>>>>>>> >>>>>>>>>> Matei >>>>>>>>>> >>>>>>>>>> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote: >>>>>>>>>> >>>>>>>>>> Users instead of people, sure. Commiters and contributors are (or >>>>>>>>>> at >>>>>>>>>> least >>>>>>>>>> should be) a subset of users. >>>>>>>>>> >>>>>>>>>> Non goals, sure. I don't care what the name is, but we need to >>>>>>>>>> clearly >>>>>>>>>> say >>>>>>>>>> e.g. 'no we are not maintaining compatibility with XYZ right now'. >>>>>>>>>> >>>>>>>>>> API, what I care most about is whether it allows me to accomplish >>>>>>>>>> the >>>>>>>>>> goals. >>>>>>>>>> Arguing about how ugly or pretty it is can be saved for design/ >>>>>>>>>> implementation imho. >>>>>>>>>> >>>>>>>>>> Strategy, this is necessary because otherwise goals can be out of >>>>>>>>>> line >>>>>>>>>> with >>>>>>>>>> reality. Don't propose goals you don't have at least some idea of >>>>>>>>>> how >>>>>>>>>> to >>>>>>>>>> implement. >>>>>>>>>> >>>>>>>>>> Rejected strategies, given that commiters are the only ones I'm >>>>>>>>>> saying >>>>>>>>>> should formally submit SPARKLIs or SIPs, if they put junk in a >>>>>>>>>> required >>>>>>>>>> section then slap them down for it and tell them to fix it. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote: >>>>>>>>>>> >>>>>>>>>>> Yup, this is the stuff that I found unclear. Thanks for clarifying >>>>>>>>>>> here, >>>>>>>>>>> but we should also clarify it in the writeup. In particular: >>>>>>>>>>> >>>>>>>>>>> - Goals needs to be about user-facing behavior ("people" is broad) >>>>>>>>>>> >>>>>>>>>>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will >>>>>>>>>>> dig >>>>>>>>>>> up >>>>>>>>>>> one of these and say "Spark's developers have officially rejected >>>>>>>>>>> X, >>>>>>>>>>> which >>>>>>>>>>> our awesome system has". >>>>>>>>>>> >>>>>>>>>>> - For user-facing stuff, I think you need a section on API. >>>>>>>>>>> Virtually >>>>>>>>>>> all >>>>>>>>>>> other *IPs I've seen have that. >>>>>>>>>>> >>>>>>>>>>> - I'm still not sure why the strategy section is needed if the >>>>>>>>>>> purpose is >>>>>>>>>>> to define user-facing behavior -- unless this is the strategy for >>>>>>>>>>> setting >>>>>>>>>>> the goals or for defining the API. That sounds squarely like a >>>>>>>>>>> design >>>>>>>>>>> doc >>>>>>>>>>> issue. In some sense, who cares whether the proposal is >>>>>>>>>>> technically >>>>>>>>>>> feasible >>>>>>>>>>> right now? If it's infeasible, that will be discovered later >>>>>>>>>>> during >>>>>>>>>>> design >>>>>>>>>>> and implementation. Same thing with rejected strategies -- listing >>>>>>>>>>> some >>>>>>>>>>> of >>>>>>>>>>> those is definitely useful sometimes, but if you make this a >>>>>>>>>>> *required* >>>>>>>>>>> section, people are just going to fill it in with bogus stuff >>>>>>>>>>> (I've >>>>>>>>>>> seen >>>>>>>>>>> this happen before). >>>>>>>>>>> >>>>>>>>>>> Matei >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> So to focus the discussion on the specific strategy I'm >>>>>>>>>>>> suggesting, >>>>>>>>>>>> documented at >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >>>>>>>>>>>> >>>>>>>>>>>> "Goals: What must this allow people to do, that they can't >>>>>>>>>>>> currently?" >>>>>>>>>>>> >>>>>>>>>>>> Is it unclear that this is focusing specifically on >>>>>>>>>>>> people-visible >>>>>>>>>>>> behavior? >>>>>>>>>>>> >>>>>>>>>>>> Rejected goals - are important because otherwise people keep >>>>>>>>>>>> trying >>>>>>>>>>>> to argue about scope. Of course you can change things later >>>>>>>>>>>> with a >>>>>>>>>>>> different SIP and different vote, the point is to focus. >>>>>>>>>>>> >>>>>>>>>>>> Use cases - are something that people are going to bring up in >>>>>>>>>>>> discussion. If they aren't clearly documented as a goal ("This >>>>>>>>>>>> must >>>>>>>>>>>> allow me to connect using SSL"), they should be added. >>>>>>>>>>>> >>>>>>>>>>>> Internal architecture - if the people who need specific behavior >>>>>>>>>>>> are >>>>>>>>>>>> implementers of other parts of the system, that's fine. >>>>>>>>>>>> >>>>>>>>>>>> Rejected strategies - If you have none of these, you have no >>>>>>>>>>>> evidence >>>>>>>>>>>> that the proponent didn't just go with the first thing they had >>>>>>>>>>>> in >>>>>>>>>>>> mind (or have already implemented), which is a big problem >>>>>>>>>>>> currently. >>>>>>>>>>>> Approval isn't binding as to specifics of implementation, so >>>>>>>>>>>> these >>>>>>>>>>>> aren't handcuffs. The goals are the contract, the strategy is >>>>>>>>>>>> evidence that contract can actually be met. >>>>>>>>>>>> >>>>>>>>>>>> Design docs - I'm not touching design docs. The markdown file I >>>>>>>>>>>> linked specifically says of the strategy section "This is not a >>>>>>>>>>>> full >>>>>>>>>>>> design document." Is this unclear? Design docs can be worked >>>>>>>>>>>> on >>>>>>>>>>>> obviously, but that's not what I'm concerned with here. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi Cody, >>>>>>>>>>>>> >>>>>>>>>>>>> I think this would be a lot more concrete if we had a more >>>>>>>>>>>>> detailed >>>>>>>>>>>>> template >>>>>>>>>>>>> for SIPs. Right now, it's not super clear what's in scope -- >>>>>>>>>>>>> e.g. >>>>>>>>>>>>> are >>>>>>>>>>>>> they >>>>>>>>>>>>> a way to solicit feedback on the user-facing behavior or on the >>>>>>>>>>>>> internals? >>>>>>>>>>>>> "Goals" can cover both things. I've been thinking of SIPs more >>>>>>>>>>>>> as >>>>>>>>>>>>> Product >>>>>>>>>>>>> Requirements Docs (PRDs), which focus on *what* a code change >>>>>>>>>>>>> should >>>>>>>>>>>>> do >>>>>>>>>>>>> as >>>>>>>>>>>>> opposed to how. >>>>>>>>>>>>> >>>>>>>>>>>>> In particular, here are some things that you may or may not >>>>>>>>>>>>> consider >>>>>>>>>>>>> in >>>>>>>>>>>>> scope for SIPs: >>>>>>>>>>>>> >>>>>>>>>>>>> - Goals and non-goals: This is definitely in scope, and IMO >>>>>>>>>>>>> should >>>>>>>>>>>>> focus on >>>>>>>>>>>>> user-visible behavior (e.g. "system supports SQL window >>>>>>>>>>>>> functions" >>>>>>>>>>>>> or >>>>>>>>>>>>> "system continues working if one node fails"). BTW I wouldn't >>>>>>>>>>>>> say >>>>>>>>>>>>> "rejected >>>>>>>>>>>>> goals" because some of them might become goals later, so we're >>>>>>>>>>>>> not >>>>>>>>>>>>> definitively rejecting them. >>>>>>>>>>>>> >>>>>>>>>>>>> - Public API: Probably should be included in most SIPs unless >>>>>>>>>>>>> it's >>>>>>>>>>>>> too >>>>>>>>>>>>> large >>>>>>>>>>>>> to fully specify then (e.g. "let's add an ML library"). >>>>>>>>>>>>> >>>>>>>>>>>>> - Use cases: I usually find this very useful in PRDs to better >>>>>>>>>>>>> communicate >>>>>>>>>>>>> the goals. >>>>>>>>>>>>> >>>>>>>>>>>>> - Internal architecture: This is usually *not* a thing users >>>>>>>>>>>>> can >>>>>>>>>>>>> easily >>>>>>>>>>>>> comment on and it sounds more like a design doc item. Of course >>>>>>>>>>>>> it's >>>>>>>>>>>>> important to show that the SIP is feasible to implement. One >>>>>>>>>>>>> exception, >>>>>>>>>>>>> however, is that I think we'll have some SIPs primarily on >>>>>>>>>>>>> internals >>>>>>>>>>>>> (e.g. >>>>>>>>>>>>> if somebody wants to refactor Spark's query optimizer or >>>>>>>>>>>>> something). >>>>>>>>>>>>> >>>>>>>>>>>>> - Rejected strategies: I personally wouldn't put this, because >>>>>>>>>>>>> what's >>>>>>>>>>>>> the >>>>>>>>>>>>> point of voting to reject a strategy before you've really begun >>>>>>>>>>>>> designing >>>>>>>>>>>>> and implementing something? What if you discover that the >>>>>>>>>>>>> strategy >>>>>>>>>>>>> is >>>>>>>>>>>>> actually better when you start doing stuff? >>>>>>>>>>>>> >>>>>>>>>>>>> At a super high level, it depends on whether you want the SIPs >>>>>>>>>>>>> to >>>>>>>>>>>>> be >>>>>>>>>>>>> PRDs >>>>>>>>>>>>> for getting some quick feedback on the goals of a feature >>>>>>>>>>>>> before >>>>>>>>>>>>> it is >>>>>>>>>>>>> designed, or something more like full-fledged design docs (just >>>>>>>>>>>>> a >>>>>>>>>>>>> more >>>>>>>>>>>>> visible design doc for bigger changes). I looked at Kafka's >>>>>>>>>>>>> KIPs, >>>>>>>>>>>>> and >>>>>>>>>>>>> they >>>>>>>>>>>>> actually seem to be more like design docs. This can work too >>>>>>>>>>>>> but >>>>>>>>>>>>> it >>>>>>>>>>>>> does >>>>>>>>>>>>> require more work from the proposer and it can lead to the same >>>>>>>>>>>>> problems you >>>>>>>>>>>>> mentioned with people already having a design and >>>>>>>>>>>>> implementation >>>>>>>>>>>>> in >>>>>>>>>>>>> mind. >>>>>>>>>>>>> >>>>>>>>>>>>> Basically, the question is, are you trying to iterate faster on >>>>>>>>>>>>> design >>>>>>>>>>>>> by >>>>>>>>>>>>> adding a step for user feedback earlier? Or are you just trying >>>>>>>>>>>>> to >>>>>>>>>>>>> make >>>>>>>>>>>>> design docs for key features more visible (and their approval >>>>>>>>>>>>> more >>>>>>>>>>>>> formal)? >>>>>>>>>>>>> >>>>>>>>>>>>> BTW note that in either case, I'd like to have a template for >>>>>>>>>>>>> design >>>>>>>>>>>>> docs >>>>>>>>>>>>> too, which should also include goals. I think that would've >>>>>>>>>>>>> avoided >>>>>>>>>>>>> some of >>>>>>>>>>>>> the issues you brought up. >>>>>>>>>>>>> >>>>>>>>>>>>> Matei >>>>>>>>>>>>> >>>>>>>>>>>>> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Here's my specific proposal (meta-proposal?) >>>>>>>>>>>>> >>>>>>>>>>>>> Spark Improvement Proposals (SIP) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Background: >>>>>>>>>>>>> >>>>>>>>>>>>> The current problem is that design and implementation of large >>>>>>>>>>>>> features >>>>>>>>>>>>> are >>>>>>>>>>>>> often done in private, before soliciting user feedback. >>>>>>>>>>>>> >>>>>>>>>>>>> When feedback is solicited, it is often as to detailed design >>>>>>>>>>>>> specifics, not >>>>>>>>>>>>> focused on goals. >>>>>>>>>>>>> >>>>>>>>>>>>> When implementation does take place after design, there is >>>>>>>>>>>>> often >>>>>>>>>>>>> disagreement as to what goals are or are not in scope. >>>>>>>>>>>>> >>>>>>>>>>>>> This results in commits that don't fully meet user needs. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Goals: >>>>>>>>>>>>> >>>>>>>>>>>>> - Ensure user, contributor, and committer goals are clearly >>>>>>>>>>>>> identified >>>>>>>>>>>>> and >>>>>>>>>>>>> agreed upon, before implementation takes place. >>>>>>>>>>>>> >>>>>>>>>>>>> - Ensure that a technically feasible strategy is chosen that is >>>>>>>>>>>>> likely >>>>>>>>>>>>> to >>>>>>>>>>>>> meet the goals. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Rejected Goals: >>>>>>>>>>>>> >>>>>>>>>>>>> - SIPs are not for detailed design. Design by committee >>>>>>>>>>>>> doesn't >>>>>>>>>>>>> work. >>>>>>>>>>>>> >>>>>>>>>>>>> - SIPs are not for every change. We dont need that much >>>>>>>>>>>>> process. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Strategy: >>>>>>>>>>>>> >>>>>>>>>>>>> My suggestion is outlined as a Spark Improvement Proposal >>>>>>>>>>>>> process >>>>>>>>>>>>> documented >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >>>>>>>>>>>>> >>>>>>>>>>>>> Specifics of Jira manipulation are an implementation detail we >>>>>>>>>>>>> can >>>>>>>>>>>>> figure >>>>>>>>>>>>> out. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm suggesting voting; the need here is for a _clear_ outcome. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Rejected Strategies: >>>>>>>>>>>>> >>>>>>>>>>>>> Having someone who understands the problem implement it first >>>>>>>>>>>>> works, >>>>>>>>>>>>> but >>>>>>>>>>>>> only if significant iteration after user feedback is allowed. >>>>>>>>>>>>> >>>>>>>>>>>>> Historically this has been problematic due to pressure to limit >>>>>>>>>>>>> public >>>>>>>>>>>>> api >>>>>>>>>>>>> changes. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Alright looks like there are quite a bit of support. We should >>>>>>>>>>>>>> wait >>>>>>>>>>>>>> to >>>>>>>>>>>>>> hear from more people too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> To push this forward, Cody and I will be working together in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> next >>>>>>>>>>>>>> couple of weeks to come up with a concrete, detailed proposal >>>>>>>>>>>>>> on >>>>>>>>>>>>>> what >>>>>>>>>>>>>> this >>>>>>>>>>>>>> entails, and then we can discuss this the specific proposal as >>>>>>>>>>>>>> well. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden >>>>>>>>>>>>>> email]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yeah, in case it wasn't clear, I was talking about SIPs for >>>>>>>>>>>>>>> major >>>>>>>>>>>>>>> user-facing or cross-cutting changes, not minor feature adds. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos >>>>>>>>>>>>>>> <[hidden email]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +1 to the SIP label as long as it does not slow down things >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>> targets optimizing efforts, coordination etc. For example >>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>> small >>>>>>>>>>>>>>>> features should not need to go through this process >>>>>>>>>>>>>>>> (assuming >>>>>>>>>>>>>>>> they >>>>>>>>>>>>>>>> dont >>>>>>>>>>>>>>>> touch public interfaces) or re-factorings and hope it will >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>> kept >>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>> way. So as a guideline doc should be provided, like in the >>>>>>>>>>>>>>>> KIP >>>>>>>>>>>>>>>> case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> IMHO so far aside from tagging things and linking them >>>>>>>>>>>>>>>> elsewhere >>>>>>>>>>>>>>>> simply >>>>>>>>>>>>>>>> having design docs and prototypes implementations in PRs is >>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>> something >>>>>>>>>>>>>>>> that has not worked so far. What is really a pain in many >>>>>>>>>>>>>>>> projects >>>>>>>>>>>>>>>> out there >>>>>>>>>>>>>>>> is discontinuity in progress of PRs, missing features, slow >>>>>>>>>>>>>>>> reviews >>>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>>> understandable to some extent... it is not only about Spark >>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> things can >>>>>>>>>>>>>>>> be improved for sure for this project in particular as >>>>>>>>>>>>>>>> already >>>>>>>>>>>>>>>> stated. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden >>>>>>>>>>>>>>>> email]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 to adding an SIP label and linking it from the website. >>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> needs >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - template that focuses it towards soliciting user goals / >>>>>>>>>>>>>>>>> non >>>>>>>>>>>>>>>>> goals >>>>>>>>>>>>>>>>> - clear resolution as to which strategy was chosen to >>>>>>>>>>>>>>>>> pursue. >>>>>>>>>>>>>>>>> I'd >>>>>>>>>>>>>>>>> recommend a vote. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Matei asked me to clarify what I meant by changing >>>>>>>>>>>>>>>>> interfaces, >>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>> it's directly relevant to the SIP idea so I'll clarify >>>>>>>>>>>>>>>>> here, >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> split >>>>>>>>>>>>>>>>> a thread for the other discussion per Nicholas' request. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I meant changing public user interfaces. I think the first >>>>>>>>>>>>>>>>> design >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> unlikely to be right, because it's done at a time when you >>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> least information. As a user, I find it considerably more >>>>>>>>>>>>>>>>> frustrating >>>>>>>>>>>>>>>>> to be unable to use a tool to get my job done, than I do >>>>>>>>>>>>>>>>> having to >>>>>>>>>>>>>>>>> make minor changes to my code in order to take advantage of >>>>>>>>>>>>>>>>> features. >>>>>>>>>>>>>>>>> I've seen committers be seriously reluctant to allow >>>>>>>>>>>>>>>>> changes >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> @experimental code that are needed in order for it to >>>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>> right. You need to be able to iterate, and if people on >>>>>>>>>>>>>>>>> both >>>>>>>>>>>>>>>>> sides >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> the fence aren't going to respect that some newer apis are >>>>>>>>>>>>>>>>> subject >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> change, then why even mark them as such? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ideally a finished SIP should give me a checklist of things >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>> implementation must do, and things that it doesn't need to >>>>>>>>>>>>>>>>> do. >>>>>>>>>>>>>>>>> Contributors/committers should be seriously discouraged >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>> putting >>>>>>>>>>>>>>>>> out a version 0.1 that doesn't have at least a prototype >>>>>>>>>>>>>>>>> implementation of all those things, especially if they're >>>>>>>>>>>>>>>>> then >>>>>>>>>>>>>>>>> going >>>>>>>>>>>>>>>>> to argue against interface changes necessary to get the the >>>>>>>>>>>>>>>>> rest >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> the things done in the 0.2 version. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden >>>>>>>>>>>>>>>>> email]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> I like the lightweight proposal to add a SIP label. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested >>>>>>>>>>>>>>>>>> using >>>>>>>>>>>>>>>>>> wiki >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> track the list of major changes, but that never really >>>>>>>>>>>>>>>>>> materialized >>>>>>>>>>>>>>>>>> due to >>>>>>>>>>>>>>>>>> the overhead. Adding a SIP label on major JIRAs and then >>>>>>>>>>>>>>>>>> link >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> them >>>>>>>>>>>>>>>>>> prominently on the Spark website makes a lot of sense. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia >>>>>>>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> For the improvement proposals, I think one major point >>>>>>>>>>>>>>>>>>> was >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>>>> them >>>>>>>>>>>>>>>>>>> really visible to users who are not contributors, so we >>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>> more than >>>>>>>>>>>>>>>>>>> sending stuff to dev@. One very lightweight idea is to >>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> new >>>>>>>>>>>>>>>>>>> type of >>>>>>>>>>>>>>>>>>> JIRA called a SIP and have a link to a filter that shows >>>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>> such >>>>>>>>>>>>>>>>>>> JIRAs from >>>>>>>>>>>>>>>>>>> http://spark.apache.org. I also like the idea of SIP and >>>>>>>>>>>>>>>>>>> design >>>>>>>>>>>>>>>>>>> doc >>>>>>>>>>>>>>>>>>> templates (in fact many projects have them). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Matei >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I called Cody last night and talked about some of the >>>>>>>>>>>>>>>>>>> topics >>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> his >>>>>>>>>>>>>>>>>>> email. >>>>>>>>>>>>>>>>>>> It became clear to me Cody genuinely cares about the >>>>>>>>>>>>>>>>>>> project. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Some of the frustrations come from the success of the >>>>>>>>>>>>>>>>>>> project >>>>>>>>>>>>>>>>>>> itself >>>>>>>>>>>>>>>>>>> becoming very "hot", and it is difficult to get clarity >>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>> people >>>>>>>>>>>>>>>>>>> who >>>>>>>>>>>>>>>>>>> don't dedicate all their time to Spark. In fact, it is in >>>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> ways >>>>>>>>>>>>>>>>>>> similar >>>>>>>>>>>>>>>>>>> to scaling an engineering team in a successful startup: >>>>>>>>>>>>>>>>>>> old >>>>>>>>>>>>>>>>>>> processes that >>>>>>>>>>>>>>>>>>> worked well might not work so well when it gets to a >>>>>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>> size, >>>>>>>>>>>>>>>>>>> cultures >>>>>>>>>>>>>>>>>>> can get diluted, building culture vs building process, >>>>>>>>>>>>>>>>>>> etc. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I also really like to have a more visible process for >>>>>>>>>>>>>>>>>>> larger >>>>>>>>>>>>>>>>>>> changes, >>>>>>>>>>>>>>>>>>> especially major user facing API changes. Historically we >>>>>>>>>>>>>>>>>>> upload >>>>>>>>>>>>>>>>>>> design docs >>>>>>>>>>>>>>>>>>> for major changes, but it is not always consistent and >>>>>>>>>>>>>>>>>>> difficult >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> quality >>>>>>>>>>>>>>>>>>> of the docs, due to the volunteering nature of the >>>>>>>>>>>>>>>>>>> organization. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Some of the more concrete ideas we discussed focus on >>>>>>>>>>>>>>>>>>> building a >>>>>>>>>>>>>>>>>>> culture >>>>>>>>>>>>>>>>>>> to improve clarity: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - Process: Large changes should have design docs posted >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> JIRA. >>>>>>>>>>>>>>>>>>> One >>>>>>>>>>>>>>>>>>> thing >>>>>>>>>>>>>>>>>>> Cody and I didn't discuss but an idea that just came to >>>>>>>>>>>>>>>>>>> me >>>>>>>>>>>>>>>>>>> is we >>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>> create a design doc template for the project and ask >>>>>>>>>>>>>>>>>>> everybody >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> follow. >>>>>>>>>>>>>>>>>>> The design doc template should also explicitly list goals >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> non-goals, to >>>>>>>>>>>>>>>>>>> make design doc more consistent. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - Process: Email dev@ to solicit feedback. We have some >>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> changes, but again very inconsistent. Just posting >>>>>>>>>>>>>>>>>>> something >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> JIRA >>>>>>>>>>>>>>>>>>> isn't >>>>>>>>>>>>>>>>>>> sufficient, because there are simply too many JIRAs and >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> signal >>>>>>>>>>>>>>>>>>> get lost >>>>>>>>>>>>>>>>>>> in the noise. While this is generally impossible to >>>>>>>>>>>>>>>>>>> enforce >>>>>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>>>>> we can't >>>>>>>>>>>>>>>>>>> force all volunteers to conform to a process (or they >>>>>>>>>>>>>>>>>>> might >>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>> even >>>>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>> aware of this), those who are more familiar with the >>>>>>>>>>>>>>>>>>> project >>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>> help by >>>>>>>>>>>>>>>>>>> emailing the dev@ when they see something that hasn't >>>>>>>>>>>>>>>>>>> been. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - Culture: The design doc author(s) should be open to >>>>>>>>>>>>>>>>>>> feedback. >>>>>>>>>>>>>>>>>>> A >>>>>>>>>>>>>>>>>>> design >>>>>>>>>>>>>>>>>>> doc should serve as the base for discussion and is by no >>>>>>>>>>>>>>>>>>> means >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> final >>>>>>>>>>>>>>>>>>> design. Of course, this does not mean the author has to >>>>>>>>>>>>>>>>>>> accept >>>>>>>>>>>>>>>>>>> every >>>>>>>>>>>>>>>>>>> feedback. They should also be comfortable accepting / >>>>>>>>>>>>>>>>>>> rejecting >>>>>>>>>>>>>>>>>>> ideas on >>>>>>>>>>>>>>>>>>> technical grounds. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - Process / Culture: For major ongoing projects, it can >>>>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>> some monthly Google hangouts that are open to the world. >>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>> actually not >>>>>>>>>>>>>>>>>>> sure how well this will work, because of the volunteering >>>>>>>>>>>>>>>>>>> nature >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> we need >>>>>>>>>>>>>>>>>>> to adjust for timezones for people across the globe, but >>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>> worth >>>>>>>>>>>>>>>>>>> trying. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - Culture: Contributors (including committers) should be >>>>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>> direct >>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> setting expectations, including whether they are working >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>>>>> issue, whether they will be working on a specific issue, >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> whether >>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>> issue or pr or jira should be rejected. Most people I >>>>>>>>>>>>>>>>>>> know >>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> community >>>>>>>>>>>>>>>>>>> are nice and don't enjoy telling other people no, but it >>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>> often >>>>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>> annoying to a contributor to not know anything than >>>>>>>>>>>>>>>>>>> getting >>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> no. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia >>>>>>>>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Love the idea of a more visible "Spark Improvement >>>>>>>>>>>>>>>>>>>> Proposal" >>>>>>>>>>>>>>>>>>>> process that >>>>>>>>>>>>>>>>>>>> solicits user input on new APIs. For what it's worth, I >>>>>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>> committers are trying to minimize their own work -- >>>>>>>>>>>>>>>>>>>> every >>>>>>>>>>>>>>>>>>>> committer >>>>>>>>>>>>>>>>>>>> cares >>>>>>>>>>>>>>>>>>>> about making the software useful for users. However, it >>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>> always >>>>>>>>>>>>>>>>>>>> hard to >>>>>>>>>>>>>>>>>>>> get user input and so it helps to have this kind of >>>>>>>>>>>>>>>>>>>> process. >>>>>>>>>>>>>>>>>>>> I've >>>>>>>>>>>>>>>>>>>> certainly >>>>>>>>>>>>>>>>>>>> looked at the *IPs a lot in other software I use just to >>>>>>>>>>>>>>>>>>>> see >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> biggest >>>>>>>>>>>>>>>>>>>> things on the roadmap. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> When you're talking about "changing interfaces", are you >>>>>>>>>>>>>>>>>>>> talking >>>>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>>> public or internal APIs? I do think many people hate >>>>>>>>>>>>>>>>>>>> changing >>>>>>>>>>>>>>>>>>>> public APIs >>>>>>>>>>>>>>>>>>>> and I actually think that's for the best of the project. >>>>>>>>>>>>>>>>>>>> That's >>>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>> technical >>>>>>>>>>>>>>>>>>>> debate, but basically, the worst thing when you're using >>>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>> piece >>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>> software >>>>>>>>>>>>>>>>>>>> is that the developers constantly ask you to rewrite >>>>>>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>>>>>>> app >>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> update to a >>>>>>>>>>>>>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue >>>>>>>>>>>>>>>>>>>> anyone >>>>>>>>>>>>>>>>>>>> who's used >>>>>>>>>>>>>>>>>>>> Protobuf, or Guava. The "let's get everyone to change >>>>>>>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>> release" model works well within a single large company, >>>>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>>> doesn't work >>>>>>>>>>>>>>>>>>>> well for a community, which is why nearly all *very* >>>>>>>>>>>>>>>>>>>> widely >>>>>>>>>>>>>>>>>>>> used >>>>>>>>>>>>>>>>>>>> programming >>>>>>>>>>>>>>>>>>>> interfaces (I'm talking things like Java standard >>>>>>>>>>>>>>>>>>>> library, >>>>>>>>>>>>>>>>>>>> Windows >>>>>>>>>>>>>>>>>>>> API, etc) >>>>>>>>>>>>>>>>>>>> almost *never* break backwards compatibility. All this >>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>> done >>>>>>>>>>>>>>>>>>>> within reason >>>>>>>>>>>>>>>>>>>> though, e.g. we do change things in major releases (2.x, >>>>>>>>>>>>>>>>>>>> 3.x, >>>>>>>>>>>>>>>>>>>> etc). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>>> To unsubscribe e-mail: [hidden email] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Stavros Kontopoulos >>>>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>>>> Lightbend, Inc. >>>>>>>>>>>>>>>> p: +30 6977967274 >>>>>>>>>>>>>>>> e: [hidden email] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe e-mail: [hidden email] >>>>>>>>> >>>>>>>>> >>>>>>>>> ________________________________ >>>>>>>>> >>>>>>>>> If you reply to this email, your message will be added to the >>>>>>>>> discussion >>>>>>>>> below: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html >>>>>>>>> >>>>>>>>> To start a new topic under Apache Spark Developers List, email >>>>>>>>> [hidden >>>>>>>>> email] >>>>>>>>> To unsubscribe from Apache Spark Developers List, click here. >>>>>>>>> NAML >>>>>>>>> >>>>>>>>> >>>>>>>>> ________________________________ >>>>>>>>> View this message in context: RE: Spark Improvement Proposals >>>>>>>>> Sent from the Apache Spark Developers List mailing list archive at >>>>>>>>> Nabble.com. >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Software Engineer >>>>>>> Netflix >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Software Engineer >>>>>> Netflix >>>> >>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org