Re: [Proposal] Modification to Spark's Semantic Versioning Policy

Sean Owen Fri, 06 Mar 2020 18:52:02 -0800

This thread established some good general principles, illustrated by a few
good examples. It didn't draw specific conclusions about what to add back,
which is why it wasn't at all controversial. What it means in specific
cases is where there may be disagreement, and that harder question hasn't
been addressed.


The reverts I have seen so far seemed like the obvious one, but yes, there
are several more going on now, some pretty broad. I am not even sure what
all of them are. In addition to below,
https://github.com/apache/spark/pull/27839. Would it be too much overhead
to post to this thread any changes that one believes are endorsed by these
principles and perhaps a more strict interpretation of them now? It's
important enough we should get any data points or input, and now. (We're
obviously not going to debate each one.) A draft PR, or several, actually
sounds like a good vehicle for that -- as long as people know about them!

Also, is there any usage data available to share? many arguments turn
around 'commonly used' but can we know that more concretely?

Otherwise I think we'll back into implementing personal interpretations of
general principles, which is arguably the issue in the first place, even
when everyone believes in good faith in the same principles.



On Fri, Mar 6, 2020 at 1:08 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Hi, All.
>
> Recently, reverting PRs seems to start to spread like the *well-known*
> virus.
> Can we finalize this first before doing unofficial personal decisions?
> Technically, this thread was not a vote and our website doesn't have a
> clear policy yet.
>
> https://github.com/apache/spark/pull/27821
> [SPARK-25908][SQL][FOLLOW-UP] Add Back Multiple Removed APIs
>     ==> This technically revert most of the SPARK-25908.
>
> https://github.com/apache/spark/pull/27835
> Revert "[SPARK-25457][SQL] IntegralDivide returns data type of the
> operands"
>
> https://github.com/apache/spark/pull/27834
> Revert [SPARK-24640][SQL] Return `NULL` from `size(NULL)` by default
>
> Bests,
> Dongjoon.
>
> On Thu, Mar 5, 2020 at 9:08 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Hi, All.
>>
>> There is a on-going Xiao's PR referencing this email.
>>
>> https://github.com/apache/spark/pull/27821
>>
>> Bests,
>> Dongjoon.
>>
>> On Fri, Feb 28, 2020 at 11:20 AM Sean Owen <sro...@gmail.com> wrote:
>>
>>> On Fri, Feb 28, 2020 at 12:03 PM Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>> >>     1. Could you estimate how many revert commits are required in
>>> `branch-3.0` for new rubric?
>>>
>>> Fair question about what actual change this implies for 3.0? so far it
>>> seems like some targeted, quite reasonable reverts. I don't think
>>> anyone's suggesting reverting loads of changes.
>>>
>>>
>>> >>     2. Are you going to revert all removed test cases for the
>>> deprecated ones?
>>> > This is a good point, making sure we keep the tests as well is
>>> important (worse than removing a deprecated API is shipping it broken),.
>>>
>>> (I'd say, yes of course! which seems consistent with what is happening
>>> now)
>>>
>>>
>>> >>     3. Does it make any delay for Apache Spark 3.0.0 release?
>>> >>         (I believe it was previously scheduled on June before Spark
>>> Summit 2020)
>>> >
>>> > I think if we need to delay to make a better release this is ok,
>>> especially given our current preview releases being available to gather
>>> community feedback.
>>>
>>> Of course these things block 3.0 -- all the more reason to keep it
>>> specific and targeted -- but nothing so far seems inconsistent with
>>> finishing in a month or two.
>>>
>>>
>>> >> Although there was a discussion already, I want to make the following
>>> tough parts sure.
>>> >>     4. We are not going to add Scala 2.11 API, right?
>>> > I hope not.
>>> >>
>>> >>     5. We are not going to support Python 2.x in Apache Spark 3.1+,
>>> right?
>>> > I think doing that would be bad, it's already end of lifed elsewhere.
>>>
>>> Yeah this is an important subtext -- the valuable principles here
>>> could be interpreted in many different ways depending on how much you
>>> weight value of keeping APIs for compatibility vs value in simplifying
>>> Spark and pushing users to newer APIs more forcibly. They're all
>>> judgment calls, based on necessarily limited data about the universe
>>> of users. We can only go on rare direct user feedback, on feedback
>>> perhaps from vendors as proxies for a subset of users, and the general
>>> good faith judgment of committers who have lived Spark for years.
>>>
>>> My specific interpretation is that the standard is (correctly)
>>> tightening going forward, and retroactively a bit for 3.0. But, I do
>>> not think anyone is advocating for the logical extreme of, for
>>> example, maintaining Scala 2.11 compatibility indefinitely. I think
>>> that falls out readily from the rubric here: maintaining 2.11
>>> compatibility is really quite painful if you ever support 2.13 too,
>>> for example.
>>>
>>

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

Reply via email to