I think that makes sense. The main benefit of deprecating *prior* to 3.0 would be informational - making the community aware of the upcoming transition earlier. But there are other ways to start informing the community between now and 3.0, besides formal deprecation.
I have some residual curiosity about what it might mean for a release like 2.4 to still be in its support lifetime after Py2 goes EOL. I asked Apache Legal <https://issues.apache.org/jira/browse/LEGAL-407> to comment. It is possible there are no issues with this at all. On Mon, Sep 17, 2018 at 4:26 PM, Reynold Xin <r...@databricks.com> wrote: > i'd like to second that. > > if we want to communicate timeline, we can add to the release notes saying > py2 will be deprecated in 3.0, and removed in a 3.x release. > > -- > excuse the brevity and lower case due to wrist injury > > > On Mon, Sep 17, 2018 at 4:24 PM Matei Zaharia <matei.zaha...@gmail.com> > wrote: > >> That’s a good point — I’d say there’s just a risk of creating a >> perception issue. First, some users might feel that this means they have to >> migrate now, which is before Python itself drops support; they might also >> be surprised that we did this in a minor release (e.g. might we drop Python >> 2 altogether in a Spark 2.5 if that later comes out?). Second, contributors >> might feel that this means new features no longer have to work with Python >> 2, which would be confusing. Maybe it’s OK on both fronts, but it just >> seems scarier for users to do this now if we do plan to have Spark 3.0 in >> the next 6 months anyway. >> >> Matei >> >> > On Sep 17, 2018, at 1:04 PM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> > >> > What is the disadvantage to deprecating now in 2.4.0? I mean, it >> doesn't change the code at all; it's just a notification that we will >> eventually cease supporting Py2. Wouldn't users prefer to get that >> notification sooner rather than later? >> > >> > On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >> > I’d like to understand the maintenance burden of Python 2 before >> deprecating it. Since it is not EOL yet, it might make sense to only >> deprecate it once it’s EOL (which is still over a year from now). >> Supporting Python 2+3 seems less burdensome than supporting, say, multiple >> Scala versions in the same codebase, so what are we losing out? >> > >> > The other thing is that even though Python core devs might not support >> 2.x later, it’s quite possible that various Linux distros will if moving >> from 2 to 3 remains painful. In that case, we may want Apache Spark to >> continue releasing for it despite the Python core devs not supporting it. >> > >> > Basically, I’d suggest to deprecate this in Spark 3.0 and then remove >> it later in 3.x instead of deprecating it in 2.4. I’d also consider looking >> at what other data science tools are doing before fully removing it: for >> example, if Pandas and TensorFlow no longer support Python 2 past some >> point, that might be a good point to remove it. >> > >> > Matei >> > >> > > On Sep 17, 2018, at 11:01 AM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> > > >> > > If we're going to do that, then we need to do it right now, since >> 2.4.0 is already in release candidates. >> > > >> > > On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson <eerla...@redhat.com> >> wrote: >> > > I like Mark’s concept for deprecating Py2 starting with 2.4: It may >> seem like a ways off but even now there may be some spark versions >> supporting Py2 past the point where Py2 is no longer receiving security >> patches >> > > >> > > >> > > On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra < >> m...@clearstorydata.com> wrote: >> > > We could also deprecate Py2 already in the 2.4.0 release. >> > > >> > > On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson <eerla...@redhat.com> >> wrote: >> > > In case this didn't make it onto this thread: >> > > >> > > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and >> remove it entirely on a later 3.x release. >> > > >> > > On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson <eerla...@redhat.com> >> wrote: >> > > On a separate dev@spark thread, I raised a question of whether or >> not to support python 2 in Apache Spark, going forward into Spark 3.0. >> > > >> > > Python-2 is going EOL at the end of 2019. The upcoming release of >> Spark 3.0 is an opportunity to make breaking changes to Spark's APIs, and >> so it is a good time to consider support for Python-2 on PySpark. >> > > >> > > Key advantages to dropping Python 2 are: >> > > • Support for PySpark becomes significantly easier. >> > > • Avoid having to support Python 2 until Spark 4.0, which is >> likely to imply supporting Python 2 for some time after it goes EOL. >> > > (Note that supporting python 2 after EOL means, among other things, >> that PySpark would be supporting a version of python that was no longer >> receiving security patches) >> > > >> > > The main disadvantage is that PySpark users who have legacy python-2 >> code would have to migrate their code to python 3 to take advantage of >> Spark 3.0 >> > > >> > > This decision obviously has large implications for the Apache Spark >> community and we want to solicit community feedback. >> > > >> > > >> > >> >>