Re: Breaking API changes in Spark 3.0

Dongjoon Hyun Wed, 19 Feb 2020 14:36:30 -0800

Sure. I understand the background of the following requests. So, it's a
good time to decide the criteria in order to start discussion.


    1. "to provide a reasonable migration path we’d want the replacement of
the deprecated API to also exist in 2.4"
    2. "We need to discuss the APIs case by case"

For now, it's unclear what is `necessarily painful`, what is "widely used
APIs", or how small is "the maintenance costs are small".

I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward
compatible with Apache Spark 2.4.5 like Apache Kafka?
Are we going to revert all changes? If there is a clear criteria, we didn't
need to do the clean up for that long period of 3.0.0.

BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in
this thread.

Bests,
Dongjoon.


On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <lix...@databricks.com> wrote:

> Like https://github.com/apache/spark/pull/23131, we added back unionAll.
>
> We might need to double check whether we removed some widely used APIs in
> this release before RC. If the maintenance costs are small, keeping some
> deprecated APIs look reasonable to me. This can help the adoption of Spark
> 3.0. We need to discuss the APIs case by case.
>
> Xiao
>
> On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> So my understanding would be that to provide a reasonable migration path
>> we’d want the replacement of the deprecated API to also exist in 2.4 this
>> way libraries and programs can dual target during the migration process.
>>
>> Now that isn’t always going to be doable, but certainly worth looking at
>> the situations where we aren’t providing a smooth migration path and making
>> sure it’s the best thing to do.
>>
>> On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>>
>>> Hi, Karen.
>>>
>>> Are you saying that Spark 3 has to have all deprecated 2.x APIs?
>>> Could you tell us what is your criteria for `unnecessarily` or
>>> `necessarily`?
>>>
>>> > the migration process from Spark 2 to Spark 3 unnecessarily painful.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <karen.f...@databricks.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am concerned that the API-breaking changes in SPARK-25908 (as well as
>>>> SPARK-16775, and potentially others) will make the migration process
>>>> from
>>>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
>>>> SQLContext.getOrCreate will break a large number of libraries currently
>>>> built on Spark 2.
>>>>
>>>> Even if library developers do not use deprecated APIs, API changes
>>>> between
>>>> 2.x and 3.x will result in inconsistencies that require hacking around.
>>>> For
>>>> a fairly small and new (2.4.3+) genomics library, I had to create a
>>>> number
>>>> of shims (https://github.com/projectglow/glow/pull/155) for the source
>>>> and
>>>> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.
>>>>
>>>> It would be best practice to avoid breaking existing APIs to ease
>>>> library
>>>> development. To avoid dealing with similar deprecated API issues down
>>>> the
>>>> road, we should practice more prudence when considering new API
>>>> proposals.
>>>>
>>>> I'd love to see more discussion on this.
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>
> --
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Breaking API changes in Spark 3.0

Reply via email to