Re: [VOTE] Amend Spark's Semantic Versioning Policy

Burak Yavuz Mon, 09 Mar 2020 17:20:52 -0700

+1

On Mon, Mar 9, 2020 at 4:55 PM Reynold Xin <[email protected]> wrote:


> +1
>
>
>
> On Mon, Mar 09, 2020 at 3:53 PM, John Zhuge <[email protected]> wrote:
>
>> +1 (non-binding)
>>
>> On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer <[email protected]> wrote:
>>
>>> +1 (non-binding)
>>>
>>> I am disappointed however that this only mentions API and not
>>> dependencies and transitive dependencies.
>>>
>>> As Spark does not provide separation between its runtime classpath and
>>> the classpath used by applications, I believe Spark's dependencies and
>>> transitive dependencies should be considered part of the API for this
>>> policy.  Breaking dependency upgrades and incompatible dependency versions
>>> are the source of much frustration.
>>>
>>>    michael
>>>
>>>
>>> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <[email protected]>
>>> wrote:
>>>
>>> +1 (binding)
>>>
>>>
>>> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <[email protected]>
>>> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Cheers,
>>>>
>>>> Xingbo
>>>>
>>>> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <[email protected]> wrote:
>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Xiao
>>>>>
>>>>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> The proposal itself seems good as the factors to consider, Thanks
>>>>>>> Michael.
>>>>>>>
>>>>>>> Several concerns mentioned look good points, in particular:
>>>>>>>
>>>>>>> > ... assuming that this is for public stable APIs, not APIs that
>>>>>>> are marked as unstable, evolving, etc. ...
>>>>>>> I would like to confirm this. We already have API annotations such
>>>>>>> as Experimental, Unstable, etc. and the implication of each is still
>>>>>>> effective. If it's for stable APIs, it makes sense to me as well.
>>>>>>>
>>>>>>> > ... can we expand on 'when' an API change can occur ?  Since we
>>>>>>> are proposing to diverge from semver. ...
>>>>>>> I think this is a good point. If we're proposing to divert
>>>>>>> from semver, the delta compared to semver will have to be clarified to
>>>>>>> avoid different personal interpretations of the somewhat general 
>>>>>>> principles.
>>>>>>>
>>>>>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
>>>>>>> Apache Spark 3.0+? ...
>>>>>>>
>>>>>>> Assuming these concerns will be addressed, +1 (binding).
>>>>>>>
>>>>>>>
>>>>>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <[email protected]>님이
>>>>>>> 작성:
>>>>>>>
>>>>>>>> +1 (non-binding)
>>>>>>>>
>>>>>>>> Bests,
>>>>>>>> Takeshi
>>>>>>>>
>>>>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> Gengliang
>>>>>>>>>
>>>>>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> +1 as well.
>>>>>>>>>>
>>>>>>>>>> Matei
>>>>>>>>>>
>>>>>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> +1 (binding), assuming that this is for public stable APIs, not
>>>>>>>>>> APIs that are marked as unstable, evolving, etc.
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>
>>>>>>>>>>> Michael's section on the trade-offs of maintaining / removing an
>>>>>>>>>>> API are one of
>>>>>>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > This new policy has a good indention, but can we narrow down
>>>>>>>>>>> on the migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>>>>>>>>>> >
>>>>>>>>>>> > I saw that there already exists a reverting PR to bring back
>>>>>>>>>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion.
>>>>>>>>>>> >
>>>>>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>>>>>>>>>> difficulty, and it's nice.
>>>>>>>>>>> >
>>>>>>>>>>> > However, for the other cases, it sounds like `recommending
>>>>>>>>>>> older APIs as much as possible` due to the following.
>>>>>>>>>>> >
>>>>>>>>>>> >      > How long has the API been in Spark?
>>>>>>>>>>> >
>>>>>>>>>>> > We had better be more careful when we add a new policy and
>>>>>>>>>>> should aim not to mislead the users and 3rd party library 
>>>>>>>>>>> developers to say
>>>>>>>>>>> "older is better".
>>>>>>>>>>> >
>>>>>>>>>>> > Technically, I'm wondering who will use new APIs in their
>>>>>>>>>>> examples (of books and StackOverflow) if they need to write an 
>>>>>>>>>>> additional
>>>>>>>>>>> warning like `this only works at 2.4.0+` always .
>>>>>>>>>>> >
>>>>>>>>>>> > Bests,
>>>>>>>>>>> > Dongjoon.
>>>>>>>>>>> >
>>>>>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> I am in broad agreement with the prposal, as any developer, I
>>>>>>>>>>> prefer
>>>>>>>>>>> >> stable well designed API's :-)
>>>>>>>>>>> >>
>>>>>>>>>>> >> Can we tie the proposal to stability guarantees given by
>>>>>>>>>>> spark and
>>>>>>>>>>> >> reasonable expectation from users ?
>>>>>>>>>>> >> In my opinion, an unstable or evolving could change - while an
>>>>>>>>>>> >> experimental api which has been around for ages should be more
>>>>>>>>>>> >> conservatively handled.
>>>>>>>>>>> >> Which brings in question what are the stability guarantees as
>>>>>>>>>>> >> specified by annotations interacting with the proposal.
>>>>>>>>>>> >>
>>>>>>>>>>> >> Also, can we expand on 'when' an API change can occur ?
>>>>>>>>>>> Since we are
>>>>>>>>>>> >> proposing to diverge from semver.
>>>>>>>>>>> >> Patch release ? Minor release ? Only major release ? Based on
>>>>>>>>>>> 'impact'
>>>>>>>>>>> >> of API ? Stability guarantees ?
>>>>>>>>>>> >>
>>>>>>>>>>> >> Regards,
>>>>>>>>>>> >> Mridul
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > I'll start off the vote with a strong +1 (binding).
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> I propose to add the following text to Spark's Semantic
>>>>>>>>>>> Versioning policy and adopt it as the rubric that should be used 
>>>>>>>>>>> when
>>>>>>>>>>> deciding to break APIs (even at major versions such as 3.0).
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm.
>>>>>>>>>>> As this is a procedural vote, the measure will pass if there are 
>>>>>>>>>>> more
>>>>>>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but 
>>>>>>>>>>> the
>>>>>>>>>>> community is encouraged to add their voice to the discussion.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> [ ] +1 - Spark should adopt this policy.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> [ ] -1  - Spark should not adopt this policy.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <new policy>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Considerations When Breaking APIs
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> The Spark project strives to avoid breaking APIs or
>>>>>>>>>>> silently changing behavior, even at major versions. While this is 
>>>>>>>>>>> not
>>>>>>>>>>> always possible, the balance of the following factors should be 
>>>>>>>>>>> considered
>>>>>>>>>>> before choosing to break an API.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Cost of Breaking an API
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Breaking an API almost always has a non-trivial cost to
>>>>>>>>>>> the users of Spark. A broken API means that Spark programs need to 
>>>>>>>>>>> be
>>>>>>>>>>> rewritten before they can be upgraded. However, there are a few
>>>>>>>>>>> considerations when thinking about what the cost will be:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Usage - an API that is actively used in many different
>>>>>>>>>>> places, is always very costly to break. While it is hard to know 
>>>>>>>>>>> usage for
>>>>>>>>>>> sure, there are a bunch of ways that we can estimate:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> How long has the API been in Spark?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Is the API common even for basic programs?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> How often do we see recent questions in JIRA or mailing
>>>>>>>>>>> lists?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> How often does it appear in StackOverflow or blogs?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Behavior after the break - How will a program that works
>>>>>>>>>>> today, work after the break? The following are listed roughly in 
>>>>>>>>>>> order of
>>>>>>>>>>> increasing severity:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Will there be a compiler or linker error?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Will there be a runtime exception?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Will that exception happen after significant processing
>>>>>>>>>>> has been done?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Will we silently return different answers? (very hard to
>>>>>>>>>>> debug, might not even notice!)
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Cost of Maintaining an API
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Of course, the above does not mean that we will never
>>>>>>>>>>> break any APIs. We must also consider the cost both to the project 
>>>>>>>>>>> and to
>>>>>>>>>>> our users of keeping the API in question.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Project Costs - Every API we have needs to be tested and
>>>>>>>>>>> needs to keep working as other parts of the project changes. These 
>>>>>>>>>>> costs
>>>>>>>>>>> are significantly exacerbated when external dependencies change 
>>>>>>>>>>> (the JVM,
>>>>>>>>>>> Scala, etc). In some cases, while not completely technically 
>>>>>>>>>>> infeasible,
>>>>>>>>>>> the cost of maintaining a particular API can become too high.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users
>>>>>>>>>>> learning Spark or trying to understand Spark programs. This cost 
>>>>>>>>>>> becomes
>>>>>>>>>>> even higher when the API in question has confusing or undefined 
>>>>>>>>>>> semantics.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Alternatives to Breaking an API
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of
>>>>>>>>>>> removal is also high, there are alternatives that should be 
>>>>>>>>>>> considered that
>>>>>>>>>>> do not hurt existing users but do address some of the maintenance 
>>>>>>>>>>> costs.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an
>>>>>>>>>>> important point. Anytime we are adding a new interface to Spark we 
>>>>>>>>>>> should
>>>>>>>>>>> consider that we might be stuck with this API forever. Think deeply 
>>>>>>>>>>> about
>>>>>>>>>>> how new APIs relate to existing ones, as well as how you expect 
>>>>>>>>>>> them to
>>>>>>>>>>> evolve over time.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should
>>>>>>>>>>> point to a clear alternative and should never just say that an API 
>>>>>>>>>>> is
>>>>>>>>>>> deprecated.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Updated Docs - Documentation should point to the "best"
>>>>>>>>>>> recommended way of performing a given task. In the cases where we 
>>>>>>>>>>> maintain
>>>>>>>>>>> legacy documentation, we should clearly point to newer APIs and 
>>>>>>>>>>> suggest to
>>>>>>>>>>> users the "right" way.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs
>>>>>>>>>>> and other sites such as StackOverflow. However, many of these 
>>>>>>>>>>> resources are
>>>>>>>>>>> out of date. Update them, to reduce the cost of eventually removing
>>>>>>>>>>> deprecated APIs.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> </new policy>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> >> To unsubscribe e-mail: [email protected]
>>>>>>>>>>> >>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ---
>>>>>>>> Takeshi Yamamuro
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>
>>>>
>>>
>>> --
>>> Takuya UESHIN
>>>
>>> http://twitter.com/ueshin
>>>
>>>
>>>
>>
>> --
>> John Zhuge
>>
>
>

Re: [VOTE] Amend Spark's Semantic Versioning Policy

Reply via email to