Apache Spark 2.0 was released in July 2016. Assuming the project has been
trying the best to follow the semantic versioning, it is "more than three
years" to wait for the breaking changes. What the community misses to
address necessary breaking changes would be going to be technical debts for
another 3+ years.

As the PRs removing deprecated APIs were pointed out first, I'm not sure
about the reason. I roughly remember that these PRs target to remove
deprecated APIs deprecated at couple of minor versions before. If then
what's the matter?

If the deprecation messages don't kindly guide about alternatives then
that's the major problem the community should concern and try to fix, but
that's another problem. The community doesn't deprecate the API just for
fun. Every deprecation has the reason, and not removing the API doesn't
make sense unless the community has mistaken for a reason of deprecation.

If the community really would like to build some (soft) rules/policies on
deprecation, I would only imagine 2 items -

1. define "minimum release to live" (either each deprecated API or globally)
2. never skip describing the reason of deprecation and try best to describe
alternative works same or similar - if the alternative doesn't work exactly
same, also describe the difference (optionally, maybe)

I cannot imagine other problems at all about deprecation.

On Thu, Feb 20, 2020 at 7:36 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Sure. I understand the background of the following requests. So, it's a
> good time to decide the criteria in order to start discussion.
>
>     1. "to provide a reasonable migration path we’d want the replacement
> of the deprecated API to also exist in 2.4"
>     2. "We need to discuss the APIs case by case"
>
> For now, it's unclear what is `necessarily painful`, what is "widely used
> APIs", or how small is "the maintenance costs are small".
>
> I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward
> compatible with Apache Spark 2.4.5 like Apache Kafka?
> Are we going to revert all changes? If there is a clear criteria, we
> didn't need to do the clean up for that long period of 3.0.0.
>
> BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in
> this thread.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <lix...@databricks.com> wrote:
>
>> Like https://github.com/apache/spark/pull/23131, we added back unionAll.
>>
>> We might need to double check whether we removed some widely used APIs in
>> this release before RC. If the maintenance costs are small, keeping some
>> deprecated APIs look reasonable to me. This can help the adoption of Spark
>> 3.0. We need to discuss the APIs case by case.
>>
>> Xiao
>>
>> On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>> So my understanding would be that to provide a reasonable migration path
>>> we’d want the replacement of the deprecated API to also exist in 2.4 this
>>> way libraries and programs can dual target during the migration process.
>>>
>>> Now that isn’t always going to be doable, but certainly worth looking at
>>> the situations where we aren’t providing a smooth migration path and making
>>> sure it’s the best thing to do.
>>>
>>> On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>>
>>>> Hi, Karen.
>>>>
>>>> Are you saying that Spark 3 has to have all deprecated 2.x APIs?
>>>> Could you tell us what is your criteria for `unnecessarily` or
>>>> `necessarily`?
>>>>
>>>> > the migration process from Spark 2 to Spark 3 unnecessarily painful.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <karen.f...@databricks.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am concerned that the API-breaking changes in SPARK-25908 (as well as
>>>>> SPARK-16775, and potentially others) will make the migration process
>>>>> from
>>>>> Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
>>>>> SQLContext.getOrCreate will break a large number of libraries currently
>>>>> built on Spark 2.
>>>>>
>>>>> Even if library developers do not use deprecated APIs, API changes
>>>>> between
>>>>> 2.x and 3.x will result in inconsistencies that require hacking
>>>>> around. For
>>>>> a fairly small and new (2.4.3+) genomics library, I had to create a
>>>>> number
>>>>> of shims (https://github.com/projectglow/glow/pull/155) for the
>>>>> source and
>>>>> test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.
>>>>>
>>>>> It would be best practice to avoid breaking existing APIs to ease
>>>>> library
>>>>> development. To avoid dealing with similar deprecated API issues down
>>>>> the
>>>>> road, we should practice more prudence when considering new API
>>>>> proposals.
>>>>>
>>>>> I'd love to see more discussion on this.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>
>>>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>>
>> --
>> <https://databricks.com/sparkaisummit/north-america>
>>
>

Reply via email to