To give a better understanding where we are w.r.t. Python 3, I'd like to give a quick overview of the recent work that has been happening in Beam community to support Python 3, and to summarize the current status of this effort.
Current status: 1. Beam 2.11.0 was the first release that offered Python 3 support, specifically Python 3.5 support. Due to several limitations that have been fixed since 2.11.0, Beam 2.13.0 (or newer version) is recommended for Python 3 pipelines. 2. Pipelines running on Portable Flink / Spark runners may have to use Beam 2.14.0 once it becomes available. 3. Python 3.5 or newer version of the interpreter is required to install Beam and run Python 3 pipelines. Known remaining limitations of current Python 3 offering: 1. Several syntactic constructs introduced in Python 3 (keyword-only arguments, dataclasses), are not yet supported. See: BEAM-5878, BEAM-7284. 2. Pickling errors occasionally prevent usage of --save_main_session flag, but changes to the pipeline code may help to overcome this limitation. See: BEAM-6158, BEAM-7540 3. Beam has limited type inference capabilities support in Python 3.6+, and type checking of Beam typehints is not always enforced, see: BEAM-2713, BEAM-7377. The cause of limitations 1-2 largely lies in Beam dependency 'dill' that supports pickling. In the immediate future we will be working on evaluating replacements or/and fixes to address this. We are also working on an improved typehints support in Python 3, see: BEAM-2713. The efforts to make Beam codebase Python3-compatible started back in 2017. Most of this work is visible in BEAM-1251[1] and in Kanban Board [2]. 2017: - BEAM-1251 is opened, and first efforts to make Beam codebase Python3-compatible followed shortly. Q3-Q4 2019: - Active work on "futurizing" Beam codebase piece-by-piece while preventing regressions in performance in existing Python 2 offering. - Building test infrastructure to incorporate Python 3 test scenarios. Apache Beam 2.11.0 (Q1 2019): - "Futurization" of Beam Python codebase completed. - Apache Beam 2.11.0 is released with Python 3 support, with limitations. - Continuous pre-commit and post-commit test suites added for Python 3.5. - Gaps in Python 3 support in Datastore IO, Avro IO, Bigquery IO identified and scoped. - Continuous testing mostly limited to Python 3.5. Apache Beam 2.12.0 (Q2 2019): - Pre and Post-commit test coverage expanded to Python 3.5, 3.6, 3.7. - Direct and Dataflow runners added support for Python 3.6 - 3.7. Apache Beam 2.13.0 (Q2 2019) - Avro IO support enabled on Python 3. - Datastore IO support enabled on Python 3. - Bigquery IO support for BYTES datatype enabled on Python 3. Apache Beam 2.14.0 (to be released in Q3 2019) - Python 3 bug fixes for Bigquery IO and Portable Runner - Every Python SDK commit exercises Direct, Dataflow, and Portable Flink runners on Python 3 in various test suites. - Beam 2.14.0 will declare Python 3.5, 3.6, 3.7 support in PyPi. Next steps: - Address known limitations and user feedback. - Increase Python 3 test coverage in portable runner. - Assist Beam users in Python 2 -> Python 3 migration. - Deprecate of Python 2 support in Beam, cleanup the codebase. I'd like to thank all Beam contributors who have been helping to push this effort so far. [1] https://issues.apache.org/jira/browse/BEAM-1251 [2] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail On Tue, Jun 18, 2019 at 12:03 AM Valentyn Tymofieiev <valen...@google.com> wrote: > I like the update Ismaël referenced [1], I think we should prepare a > similar update for Beam users. I would propose the following: > - Designate last LTS release that we will have in 2019 to be the last LTS > release with Python 2 support. > - Add a Beam-specific deprecation warning on Python 2 starting from the > last LTS release, or last 2 releases of Beam in 2019, whichever happens > earlier. > - Remove Python 2 support starting from the first release in 2020. > > The cost of maintaining Python 2.7 support is higher than 0. Some issues > that come to mind: > - Maintaining Py2.7 / Py 3+ compatibility of Beam codebase makes it > difficult to use Python 3 syntax in Beam which may be necessary to support > and test syntactic constructs introduced in Python 3. > - Running additional test suites increases the load on test infrastructure > and increases flakiness. > > [1] https://spark.apache.org/news/plan-for-dropping-python-2-support.html > > On Tue, Jun 11, 2019 at 7:57 AM Robert Bradshaw <rober...@google.com> > wrote: > >> Sounds good. >> >> On Fri, Jun 7, 2019 at 8:28 PM Ahmet Altay <al...@google.com> wrote: >> >>> I agree with you. A more recent LTS release with python 2 support will >>> be good. Cost of maintaining python 2 support is also fairly low (maybe >>> zero actually besides keeping some pre-existing compatibility code). >>> >>> I believe we are referring to two separate things with support: >>> - Supporting existing releases for patches - I agree that we need to >>> give users a long enough window to upgrade. Great if it happens with an LTS >>> release. Even if it does not, I think it will be fair to offer patches on >>> the last python 2 supporting release during some part of 2020 if that >>> becomes necessary. >>> - Making new releases with python 2 support - Each new Beam release with >>> python 2 support will implicitly extend the lifetime of beam's python 2 >>> support. I do not think we need to extend this to beyond 2019. 2 releases >>> (~ 3 months) after solid python 3 support will very likely put the last >>> python 2 supporting release to last quarter of 2019 already. >>> >>> On Fri, Jun 7, 2019 at 2:15 AM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> I don't think the second release with robust/recommended Python 3 >>>> support should be the last release with Python 2 support--that is >>>> simply not enough time for people to migrate. (Look at how long it >>>> took us...) It does make a lot of sense to at least have one LTS >>>> release with support for both. >>>> >>>> Regarding timeline, I think we could safely say we expect to support >>>> Python 2 through 2019, likely for some of 2020 (possibly only via an >>>> LTS release), and (very) unlikely beyond 2020. >>>> >>>> On Wed, Jun 5, 2019 at 6:34 PM Ahmet Altay <al...@google.com> wrote: >>>> > >>>> > I agree with the sentiment on this thread. Our priority needs to be >>>> offering good python 3 support that we can comfortably recommend users to >>>> switch. Progress on that so far has been promising and I do anticipate that >>>> we will reach there in the near future. >>>> > >>>> > My proposal would be, once we reach to that state, we can mark the >>>> first subsequent Beam release as the last Beam release that supports Python >>>> 2. (Alternatively: in line with the previous experimental/deprecated >>>> discussion we can make 2 more release with python 2 support rather than >>>> just 1 more.) With the current state, we would not give users plenty of >>>> time to upgrade python 3. So in addition, I would suggest we can consider >>>> and upgrade relief by offering something like a 6-month support on the last >>>> python 2 compatible release. We might do that in the context of an LTS >>>> release. >>>> > >>>> > I do not believe we have a timeline we can share with users at this >>>> point. However if we go with this suggestion, we will probably support >>>> python 2 approximately until mid-2020. >>>> > >>>> > Ahmet >>>> > >>>> > On Wed, Jun 5, 2019 at 4:53 AM Tanay Tummalapalli < >>>> ttanay...@gmail.com> wrote: >>>> >> >>>> >> We can support Python 2 for some time in 2020, but, we should target >>>> a date no later than 2020 to drop support. >>>> >> If we do plan to drop support for Python 2 in 2020, we should sign >>>> the Python 3 statement[1], declaring that we will "drop support for Python >>>> 2.7 no later than 2020". >>>> >> >>>> >> In addition to the statement, keeping a target release and date(if >>>> possible) or timeline to drop support would also help users to decide when >>>> they need to work on migrating to Python 3. >>>> >> >>>> >> Regards, >>>> >> - TT >>>> >> >>>> >> [1] https://python3statement.org/ >>>> >> >>>> >> On Wed, Jun 5, 2019 at 4:37 PM Robert Bradshaw <rober...@google.com> >>>> wrote: >>>> >>> >>>> >>> Until Python 3 support for Beam is officially out of beta and >>>> >>> recommended, I don't think we can tell people to stop using Python >>>> 2. >>>> >>> Given that 2020 is just over 6 months away, that seems a short >>>> >>> transition time, so I would guess we'll have to continue supporting >>>> >>> Python 2 sometime into 2020. >>>> >>> >>>> >>> A quick survey of users would be valuable here. But first priority >>>> is >>>> >>> making Python 3 rock solid so we can unconditionally recommend it >>>> over >>>> >>> Python 2. >>>> >>> >>>> >>> On Wed, Jun 5, 2019 at 12:27 PM Ismaël Mejía <ieme...@gmail.com> >>>> wrote: >>>> >>> > >>>> >>> > Python 2 won't be maintained after 2020 [1]. I was wondering what >>>> will >>>> >>> > be our (Beam) plan for this. Other projects [2] have started to >>>> alert >>>> >>> > users that support will be removed so maybe we should decide or >>>> policy >>>> >>> > for this too. >>>> >>> > >>>> >>> > [1] https://pythonclock.org/ >>>> >>> > [2] >>>> https://spark.apache.org/news/plan-for-dropping-python-2-support.html >>>> >>>