Re: [DISCUSS] Spark 3.0 and DataSourceV2

Mark Hamstra Sun, 24 Feb 2019 13:45:44 -0800

>
> I’m not quite sure what you mean here.
>

I'll try to explain once more, then I'll drop it since continuing the rest
of the discussion in this thread is more important than getting
side-tracked.


There is nothing wrong with individuals advocating for what they think
should or should not be in Spark 3.0, nor should anyone shy away from
explaining why they think delaying the release for some reason is or isn't
a good idea. What is a problem, or is at least something that I have a
problem with, are declarative, pseudo-authoritative statements that 3.0 (or
some other release) will or won't contain some feature, API, etc. or that
some issue is or is not blocker or worth delaying for. When the PMC has not
voted on such issues, I'm often left thinking, "Wait... what? Who decided
that, or where did that decision come from?"

On Sun, Feb 24, 2019 at 1:27 PM Ryan Blue <rb...@netflix.com> wrote:

> Thanks to Matt for his philosophical take. I agree.
>
> The intent is to set a common goal, so that we work toward getting v2 in a
> usable state as a community. Part of that is making choices to get it done
> on time, which we have already seen on this thread: setting out more
> clearly what we mean by “DSv2” and what we think we can get done on time.
>
> I don’t mean to say that we should commit to a plan that *requires* a
> delay to the next release (which describes the goal better than 3.0 does).
> But we should commit to making sure the goal is met, acknowledging that
> this is one of the most important efforts for many people that work in this
> community.
>
> I think it would help to clarify what this commitment means, at least to
> me:
>
>    1. What it means: the community will seriously consider delaying the
>    next release if this isn’t done by our initial deadline.
>    2. What it does not mean: delaying the release no matter what happens.
>
> In that event that this feature isn’t done on time, it would be up to the
> community to decide what to do. But in the mean time, I think it is healthy
> to set a goal and work toward it. (I am not making a distinction between
> PMC and community here.)
>
> I think this commitment is a good idea for the same reason why we set
> other goals: to hold ourselves accountable. When one sets a New Years
> resolution to drop 10 pounds, it isn’t that the hope or intent wasn’t there
> before. It is about having a (self-imposed) constraint that helps you make
> hard choices: cake now or meet my goal?
>
> Spark 3.0 has many other major features as well, delaying the release has
> significant cost and we should try our best to not let it happen.”
>
> I agree with Wenchen here. No one wants to actually delay the release. We
> just want to push ourselves to make some tough decisions, using that delay
> as a motivating factor.
>
> The fact that some entity other than the PMC thinks that Spark 3.0 should
> contain certain new features or that it will be costly to them if 3.0 does
> not contain those features is not dispositive.
>
> I’m not quite sure what you mean here. While I am representing my
> employer, I am bringing up this topic as a member of the community, to
> suggest a direction for the community to take, and I fully accept that the
> decision is up to the community. I think it is reasonable to candidly state
> how this matters; that context informs the discussion.
>
> On Fri, Feb 22, 2019 at 1:55 PM Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> To your other message: I already see a number of PMC members here. Who's
>>> the other entity?
>>>
>>
>> I'll answer indirectly since pointing fingers isn't really my intent. In
>> the absence of a PMC vote, I react negatively to individuals making new
>> declarative policy statements or statements to the effect that Spark
>> 3.0 will (or will not) include these features..., or that it will be too
>> costly to do something. Maybe these are innocent shorthand that leave off a
>> clarifying "in my opinion" or "according to the current state of JIRA" or
>> some such.
>>
>> My points are simply that nobody other than the PMC has an authoritative
>> say on such matters, and if we are at a point where the community needs
>> some definitive guidance, then we need PMC involvement and a vote. That's
>> not intended to preclude or terminate community discussion, because that
>> is, indeed, lovely to see.
>>
>> On Fri, Feb 22, 2019 at 12:04 PM Sean Owen <sro...@apache.org> wrote:
>>
>>> To your other message: I already see a number of PMC members here. Who's
>>> the other entity? The PMC is the thing that says a thing is a release,
>>> sure, but this discussion is properly a community one. And here we are,
>>> this is lovely to see.
>>>
>>> (May I remind everyone to casually, sometime, browse the large list of
>>> other JIRAs targeted for Spark 3? it's much more than DSv2!)
>>>
>>> I can't speak to specific decisions here, but, I see:
>>>
>>> Spark 3 doesn't have a release date. Notionally it's 6 months after
>>> Spark 2.4 (Nov 2018). It'd be reasonable to plan for a little more time.
>>> Can we throw out... June 2019, and I update the website? It can slip but
>>> that gives a concrete timeframe around which to plan. What can comfortably
>>> get in by June 2019?
>>>
>>> Agreement that "DSv2" is going into Spark 3, for some definition of DSv2
>>> that's probably roughly Matt's list.
>>>
>>> Changes that can't go into a minor release (API changes, etc) must by
>>> definition go into Spark 3.0. Agree those first and do those now. Delay
>>> Spark 3 until they're done and prioritize accordingly.
>>> Changes that can go into a minor release can go into 3.1, if needed.
>>> This has been in discussion long enough that I think whatever design(s)
>>> are on the table for DSv2 now are as close as one is going to get. The
>>> perfect is the enemy of the good.
>>>
>>> Aside from throwing out a date, I probably just restated what everyone
>>> said. But I was 'summoned' :)
>>>
>>> On Fri, Feb 22, 2019 at 12:40 PM Mark Hamstra <m...@clearstorydata.com>
>>> wrote:
>>>
>>>> However, as other people mentioned, Spark 3.0 has many other major
>>>>> features as well
>>>>>
>>>>
>>>> I fundamentally disagree. First, Spark 3.0 has nothing until the PMC
>>>> says it has something, and we have made no commitment along the lines that
>>>> "Spark 3.0.0 will not be released unless it contains new features x, y and
>>>> z." Second, major-version releases are not about adding new features.
>>>> Major-version releases are about making changes to the public API that we
>>>> cannot make in feature or bug-fix releases. If that is all that is
>>>> accomplished in a particular major release, that's fine -- in fact, we
>>>> quite intentionally did not target new features in the Spark 2.0.0 release.
>>>> The fact that some entity other than the PMC thinks that Spark 3.0 should
>>>> contain certain new features or that it will be costly to them if 3.0 does
>>>> not contain those features is not dispositive. If there are public API
>>>> changes that should occur in a timely fashion and there is also a list of
>>>> new features that some users or contributors want to see in 3.0 but that
>>>> look likely to not be ready in a timely fashion, then the PMC should fully
>>>> consider releasing 3.0 without all those new features. There is no reason
>>>> that they can't come in with 3.1.0.
>>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [DISCUSS] Spark 3.0 and DataSourceV2

Reply via email to