What is the disadvantage to deprecating now in 2.4.0? I mean, it doesn't change the code at all; it's just a notification that we will eventually cease supporting Py2. Wouldn't users prefer to get that notification sooner rather than later?
On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia <matei.zaha...@gmail.com> wrote: > I’d like to understand the maintenance burden of Python 2 before > deprecating it. Since it is not EOL yet, it might make sense to only > deprecate it once it’s EOL (which is still over a year from now). > Supporting Python 2+3 seems less burdensome than supporting, say, multiple > Scala versions in the same codebase, so what are we losing out? > > The other thing is that even though Python core devs might not support 2.x > later, it’s quite possible that various Linux distros will if moving from 2 > to 3 remains painful. In that case, we may want Apache Spark to continue > releasing for it despite the Python core devs not supporting it. > > Basically, I’d suggest to deprecate this in Spark 3.0 and then remove it > later in 3.x instead of deprecating it in 2.4. I’d also consider looking at > what other data science tools are doing before fully removing it: for > example, if Pandas and TensorFlow no longer support Python 2 past some > point, that might be a good point to remove it. > > Matei > > > On Sep 17, 2018, at 11:01 AM, Mark Hamstra <m...@clearstorydata.com> > wrote: > > > > If we're going to do that, then we need to do it right now, since 2.4.0 > is already in release candidates. > > > > On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson <eerla...@redhat.com> > wrote: > > I like Mark’s concept for deprecating Py2 starting with 2.4: It may seem > like a ways off but even now there may be some spark versions supporting > Py2 past the point where Py2 is no longer receiving security patches > > > > > > On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra <m...@clearstorydata.com> > wrote: > > We could also deprecate Py2 already in the 2.4.0 release. > > > > On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson <eerla...@redhat.com> > wrote: > > In case this didn't make it onto this thread: > > > > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and > remove it entirely on a later 3.x release. > > > > On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson <eerla...@redhat.com> > wrote: > > On a separate dev@spark thread, I raised a question of whether or not > to support python 2 in Apache Spark, going forward into Spark 3.0. > > > > Python-2 is going EOL at the end of 2019. The upcoming release of Spark > 3.0 is an opportunity to make breaking changes to Spark's APIs, and so it > is a good time to consider support for Python-2 on PySpark. > > > > Key advantages to dropping Python 2 are: > > • Support for PySpark becomes significantly easier. > > • Avoid having to support Python 2 until Spark 4.0, which is > likely to imply supporting Python 2 for some time after it goes EOL. > > (Note that supporting python 2 after EOL means, among other things, that > PySpark would be supporting a version of python that was no longer > receiving security patches) > > > > The main disadvantage is that PySpark users who have legacy python-2 > code would have to migrate their code to python 3 to take advantage of > Spark 3.0 > > > > This decision obviously has large implications for the Apache Spark > community and we want to solicit community feedback. > > > > > >