+1 On Mon, Mar 9, 2020 at 4:55 PM Reynold Xin <r...@databricks.com> wrote:
> +1 > > > > On Mon, Mar 09, 2020 at 3:53 PM, John Zhuge <jzh...@apache.org> wrote: > >> +1 (non-binding) >> >> On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer <heue...@gmail.com> wrote: >> >>> +1 (non-binding) >>> >>> I am disappointed however that this only mentions API and not >>> dependencies and transitive dependencies. >>> >>> As Spark does not provide separation between its runtime classpath and >>> the classpath used by applications, I believe Spark's dependencies and >>> transitive dependencies should be considered part of the API for this >>> policy. Breaking dependency upgrades and incompatible dependency versions >>> are the source of much frustration. >>> >>> michael >>> >>> >>> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <ues...@happy-camper.st> >>> wrote: >>> >>> +1 (binding) >>> >>> >>> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1...@gmail.com> >>> wrote: >>> >>>> +1 (non-binding) >>>> >>>> Cheers, >>>> >>>> Xingbo >>>> >>>> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lix...@databricks.com> wrote: >>>> >>>>> +1 (binding) >>>>> >>>>> Xiao >>>>> >>>>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g....@gmail.com> >>>>> wrote: >>>>> >>>>>> +1 (non-binding) >>>>>> >>>>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> The proposal itself seems good as the factors to consider, Thanks >>>>>>> Michael. >>>>>>> >>>>>>> Several concerns mentioned look good points, in particular: >>>>>>> >>>>>>> > ... assuming that this is for public stable APIs, not APIs that >>>>>>> are marked as unstable, evolving, etc. ... >>>>>>> I would like to confirm this. We already have API annotations such >>>>>>> as Experimental, Unstable, etc. and the implication of each is still >>>>>>> effective. If it's for stable APIs, it makes sense to me as well. >>>>>>> >>>>>>> > ... can we expand on 'when' an API change can occur ? Since we >>>>>>> are proposing to diverge from semver. ... >>>>>>> I think this is a good point. If we're proposing to divert >>>>>>> from semver, the delta compared to semver will have to be clarified to >>>>>>> avoid different personal interpretations of the somewhat general >>>>>>> principles. >>>>>>> >>>>>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to >>>>>>> Apache Spark 3.0+? ... >>>>>>> >>>>>>> Assuming these concerns will be addressed, +1 (binding). >>>>>>> >>>>>>> >>>>>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com>님이 >>>>>>> 작성: >>>>>>> >>>>>>>> +1 (non-binding) >>>>>>>> >>>>>>>> Bests, >>>>>>>> Takeshi >>>>>>>> >>>>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < >>>>>>>> gengliang.w...@databricks.com> wrote: >>>>>>>> >>>>>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> Gengliang >>>>>>>>> >>>>>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia < >>>>>>>>> matei.zaha...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> +1 as well. >>>>>>>>>> >>>>>>>>>> Matei >>>>>>>>>> >>>>>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> +1 (binding), assuming that this is for public stable APIs, not >>>>>>>>>> APIs that are marked as unstable, evolving, etc. >>>>>>>>>> >>>>>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> +1 (non-binding) >>>>>>>>>>> >>>>>>>>>>> Michael's section on the trade-offs of maintaining / removing an >>>>>>>>>>> API are one of >>>>>>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1 >>>>>>>>>>> >>>>>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun < >>>>>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>>>>> > >>>>>>>>>>> > This new policy has a good indention, but can we narrow down >>>>>>>>>>> on the migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>>>>>>>>>> > >>>>>>>>>>> > I saw that there already exists a reverting PR to bring back >>>>>>>>>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion. >>>>>>>>>>> > >>>>>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>>>>>>>>>> difficulty, and it's nice. >>>>>>>>>>> > >>>>>>>>>>> > However, for the other cases, it sounds like `recommending >>>>>>>>>>> older APIs as much as possible` due to the following. >>>>>>>>>>> > >>>>>>>>>>> > > How long has the API been in Spark? >>>>>>>>>>> > >>>>>>>>>>> > We had better be more careful when we add a new policy and >>>>>>>>>>> should aim not to mislead the users and 3rd party library >>>>>>>>>>> developers to say >>>>>>>>>>> "older is better". >>>>>>>>>>> > >>>>>>>>>>> > Technically, I'm wondering who will use new APIs in their >>>>>>>>>>> examples (of books and StackOverflow) if they need to write an >>>>>>>>>>> additional >>>>>>>>>>> warning like `this only works at 2.4.0+` always . >>>>>>>>>>> > >>>>>>>>>>> > Bests, >>>>>>>>>>> > Dongjoon. >>>>>>>>>>> > >>>>>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan < >>>>>>>>>>> mri...@gmail.com> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> I am in broad agreement with the prposal, as any developer, I >>>>>>>>>>> prefer >>>>>>>>>>> >> stable well designed API's :-) >>>>>>>>>>> >> >>>>>>>>>>> >> Can we tie the proposal to stability guarantees given by >>>>>>>>>>> spark and >>>>>>>>>>> >> reasonable expectation from users ? >>>>>>>>>>> >> In my opinion, an unstable or evolving could change - while an >>>>>>>>>>> >> experimental api which has been around for ages should be more >>>>>>>>>>> >> conservatively handled. >>>>>>>>>>> >> Which brings in question what are the stability guarantees as >>>>>>>>>>> >> specified by annotations interacting with the proposal. >>>>>>>>>>> >> >>>>>>>>>>> >> Also, can we expand on 'when' an API change can occur ? >>>>>>>>>>> Since we are >>>>>>>>>>> >> proposing to diverge from semver. >>>>>>>>>>> >> Patch release ? Minor release ? Only major release ? Based on >>>>>>>>>>> 'impact' >>>>>>>>>>> >> of API ? Stability guarantees ? >>>>>>>>>>> >> >>>>>>>>>>> >> Regards, >>>>>>>>>>> >> Mridul >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < >>>>>>>>>>> mich...@databricks.com> wrote: >>>>>>>>>>> >> > >>>>>>>>>>> >> > I'll start off the vote with a strong +1 (binding). >>>>>>>>>>> >> > >>>>>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < >>>>>>>>>>> mich...@databricks.com> wrote: >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> I propose to add the following text to Spark's Semantic >>>>>>>>>>> Versioning policy and adopt it as the rubric that should be used >>>>>>>>>>> when >>>>>>>>>>> deciding to break APIs (even at major versions such as 3.0). >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. >>>>>>>>>>> As this is a procedural vote, the measure will pass if there are >>>>>>>>>>> more >>>>>>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but >>>>>>>>>>> the >>>>>>>>>>> community is encouraged to add their voice to the discussion. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> [ ] +1 - Spark should adopt this policy. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> [ ] -1 - Spark should not adopt this policy. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> <new policy> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Considerations When Breaking APIs >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> The Spark project strives to avoid breaking APIs or >>>>>>>>>>> silently changing behavior, even at major versions. While this is >>>>>>>>>>> not >>>>>>>>>>> always possible, the balance of the following factors should be >>>>>>>>>>> considered >>>>>>>>>>> before choosing to break an API. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Cost of Breaking an API >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Breaking an API almost always has a non-trivial cost to >>>>>>>>>>> the users of Spark. A broken API means that Spark programs need to >>>>>>>>>>> be >>>>>>>>>>> rewritten before they can be upgraded. However, there are a few >>>>>>>>>>> considerations when thinking about what the cost will be: >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Usage - an API that is actively used in many different >>>>>>>>>>> places, is always very costly to break. While it is hard to know >>>>>>>>>>> usage for >>>>>>>>>>> sure, there are a bunch of ways that we can estimate: >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> How long has the API been in Spark? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Is the API common even for basic programs? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> How often do we see recent questions in JIRA or mailing >>>>>>>>>>> lists? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Behavior after the break - How will a program that works >>>>>>>>>>> today, work after the break? The following are listed roughly in >>>>>>>>>>> order of >>>>>>>>>>> increasing severity: >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Will there be a compiler or linker error? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Will there be a runtime exception? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Will that exception happen after significant processing >>>>>>>>>>> has been done? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Will we silently return different answers? (very hard to >>>>>>>>>>> debug, might not even notice!) >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Cost of Maintaining an API >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Of course, the above does not mean that we will never >>>>>>>>>>> break any APIs. We must also consider the cost both to the project >>>>>>>>>>> and to >>>>>>>>>>> our users of keeping the API in question. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Project Costs - Every API we have needs to be tested and >>>>>>>>>>> needs to keep working as other parts of the project changes. These >>>>>>>>>>> costs >>>>>>>>>>> are significantly exacerbated when external dependencies change >>>>>>>>>>> (the JVM, >>>>>>>>>>> Scala, etc). In some cases, while not completely technically >>>>>>>>>>> infeasible, >>>>>>>>>>> the cost of maintaining a particular API can become too high. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users >>>>>>>>>>> learning Spark or trying to understand Spark programs. This cost >>>>>>>>>>> becomes >>>>>>>>>>> even higher when the API in question has confusing or undefined >>>>>>>>>>> semantics. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Alternatives to Breaking an API >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>>>>>>>> removal is also high, there are alternatives that should be >>>>>>>>>>> considered that >>>>>>>>>>> do not hurt existing users but do address some of the maintenance >>>>>>>>>>> costs. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an >>>>>>>>>>> important point. Anytime we are adding a new interface to Spark we >>>>>>>>>>> should >>>>>>>>>>> consider that we might be stuck with this API forever. Think deeply >>>>>>>>>>> about >>>>>>>>>>> how new APIs relate to existing ones, as well as how you expect >>>>>>>>>>> them to >>>>>>>>>>> evolve over time. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should >>>>>>>>>>> point to a clear alternative and should never just say that an API >>>>>>>>>>> is >>>>>>>>>>> deprecated. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>>>>>>>> recommended way of performing a given task. In the cases where we >>>>>>>>>>> maintain >>>>>>>>>>> legacy documentation, we should clearly point to newer APIs and >>>>>>>>>>> suggest to >>>>>>>>>>> users the "right" way. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs >>>>>>>>>>> and other sites such as StackOverflow. However, many of these >>>>>>>>>>> resources are >>>>>>>>>>> out of date. Update them, to reduce the cost of eventually removing >>>>>>>>>>> deprecated APIs. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> </new policy> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>> >> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> --- >>>>>>>> Takeshi Yamamuro >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> <https://databricks.com/sparkaisummit/north-america> >>>>> >>>> >>> >>> -- >>> Takuya UESHIN >>> >>> http://twitter.com/ueshin >>> >>> >>> >> >> -- >> John Zhuge >> > >