I don’t think we should remove any API even in a major release without 
deprecating it first...


________________________________
From: Mark Hamstra <m...@clearstorydata.com>
Sent: Sunday, September 16, 2018 12:26 PM
To: Erik Erlandson
Cc: u...@spark.apache.org; dev
Subject: Re: Should python-2 be supported in Spark 3.0?

We could also deprecate Py2 already in the 2.4.0 release.

On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
<eerla...@redhat.com<mailto:eerla...@redhat.com>> wrote:
In case this didn't make it onto this thread:

There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and remove it 
entirely on a later 3.x release.

On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
<eerla...@redhat.com<mailto:eerla...@redhat.com>> wrote:
On a separate dev@spark thread, I raised a question of whether or not to 
support python 2 in Apache Spark, going forward into Spark 3.0.

Python-2 is going EOL<https://github.com/python/devguide/pull/344> at the end 
of 2019. The upcoming release of Spark 3.0 is an opportunity to make breaking 
changes to Spark's APIs, and so it is a good time to consider support for 
Python-2 on PySpark.

Key advantages to dropping Python 2 are:

  *   Support for PySpark becomes significantly easier.
  *   Avoid having to support Python 2 until Spark 4.0, which is likely to 
imply supporting Python 2 for some time after it goes EOL.

(Note that supporting python 2 after EOL means, among other things, that 
PySpark would be supporting a version of python that was no longer receiving 
security patches)

The main disadvantage is that PySpark users who have legacy python-2 code would 
have to migrate their code to python 3 to take advantage of Spark 3.0

This decision obviously has large implications for the Apache Spark community 
and we want to solicit community feedback.


Reply via email to