To give a better understanding where we are w.r.t. Python 3,  I'd like to
give a quick overview of the recent work that has been happening in Beam
community to support Python 3, and to summarize the current status of this
effort.

Current status:

   1.

   Beam 2.11.0 was the first release that offered Python 3 support,
   specifically Python 3.5 support. Due to several limitations that have been
   fixed since 2.11.0, Beam 2.13.0 (or newer version) is recommended for
   Python 3 pipelines.
   2.

   Pipelines running on Portable Flink / Spark runners may have to use Beam
   2.14.0 once it becomes available.
   3.

   Python 3.5 or newer version of the interpreter is required to install
   Beam and run Python 3 pipelines.


Known remaining limitations of current Python 3 offering:


   1.

   Several syntactic constructs introduced in Python 3 (keyword-only
   arguments, dataclasses), are not yet supported. See: BEAM-5878, BEAM-7284.
   2.

   Pickling errors occasionally prevent usage of --save_main_session flag,
   but changes to the pipeline code may help to overcome this limitation.
   See: BEAM-6158, BEAM-7540
   3.

   Beam has limited type inference capabilities support in Python 3.6+, and
   type checking of Beam typehints is not always enforced, see: BEAM-2713,
   BEAM-7377.


The cause of limitations 1-2 largely lies in Beam dependency 'dill' that
supports pickling. In the immediate future we will be working on evaluating
replacements or/and fixes to address this. We are also working on an
improved typehints support in Python 3, see: BEAM-2713.

The efforts to make Beam codebase Python3-compatible started back in 2017.
Most of this work is visible in BEAM-1251[1] and in Kanban Board [2].


2017:

   -

   BEAM-1251 is opened, and first efforts to make Beam codebase
   Python3-compatible followed shortly.


Q3-Q4 2019:

   -

   Active work on "futurizing" Beam codebase piece-by-piece while
   preventing regressions in performance in existing Python 2 offering.
   -

   Building test infrastructure to incorporate Python 3 test scenarios.


Apache Beam 2.11.0 (Q1 2019):

   -

   "Futurization" of Beam Python codebase completed.
   -

   Apache Beam 2.11.0 is released with Python 3 support, with limitations.
   -

   Continuous pre-commit and post-commit test suites added for Python 3.5.
   -

   Gaps in Python 3 support in Datastore IO, Avro IO, Bigquery IO
   identified and scoped.
   -

   Continuous testing mostly limited to Python 3.5.


Apache Beam 2.12.0 (Q2 2019):

   -

   Pre and Post-commit test coverage expanded to Python 3.5, 3.6, 3.7.
   -

   Direct and Dataflow runners added support for Python 3.6 - 3.7.


Apache Beam 2.13.0 (Q2 2019)

   -

   Avro IO support enabled on Python 3.
   -

   Datastore IO support enabled on Python 3.
   -

   Bigquery IO support for BYTES datatype enabled on Python 3.


Apache Beam 2.14.0 (to be released in Q3 2019)

   -

   Python 3 bug fixes for Bigquery IO and Portable Runner
   -

   Every Python SDK commit exercises Direct, Dataflow, and Portable Flink
   runners on Python 3 in various test suites.
   -

   Beam 2.14.0 will declare Python 3.5, 3.6, 3.7 support in PyPi.


Next steps:

   -

   Address known limitations and user feedback.
   -

   Increase Python 3 test coverage in portable runner.
   -

   Assist Beam users in Python 2 -> Python 3 migration.
   -

   Deprecate of Python 2 support in Beam, cleanup the codebase.


I'd like to thank all Beam contributors who have been helping to push this
effort so far.


[1] https://issues.apache.org/jira/browse/BEAM-1251

[2]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail

On Tue, Jun 18, 2019 at 12:03 AM Valentyn Tymofieiev <valen...@google.com>
wrote:

> I like the update Ismaël referenced [1], I think we should prepare a
> similar update for Beam users. I would propose the following:
> - Designate last LTS release that we will have in 2019 to be the last LTS
> release with Python 2 support.
> - Add a Beam-specific deprecation warning on Python 2 starting from the
> last LTS release, or last 2 releases of Beam in 2019, whichever happens
> earlier.
> - Remove Python 2 support starting from the first release in 2020.
>
> The cost of maintaining Python 2.7 support is higher than 0. Some issues
> that come to mind:
> - Maintaining Py2.7 / Py 3+ compatibility of Beam codebase makes it
> difficult to use Python 3 syntax in Beam which may be necessary to support
> and test syntactic constructs introduced in Python 3.
> - Running additional test suites increases the load on test infrastructure
> and increases flakiness.
>
> [1] https://spark.apache.org/news/plan-for-dropping-python-2-support.html
>
> On Tue, Jun 11, 2019 at 7:57 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> Sounds good.
>>
>> On Fri, Jun 7, 2019 at 8:28 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> I agree with you. A more recent LTS release with python 2 support will
>>> be good. Cost of maintaining python 2 support is also fairly low (maybe
>>> zero actually besides keeping some pre-existing compatibility code).
>>>
>>> I believe we are referring to two separate things with support:
>>> - Supporting existing releases for patches - I agree that we need to
>>> give users a long enough window to upgrade. Great if it happens with an LTS
>>> release. Even if it does not, I think it will be fair to offer patches on
>>> the last python 2 supporting release during some part of 2020 if that
>>> becomes necessary.
>>> - Making new releases with python 2 support - Each new Beam release with
>>> python 2 support will implicitly extend the lifetime of beam's python 2
>>> support. I do not think we need to extend this to beyond 2019. 2 releases
>>> (~ 3 months) after solid python 3 support will very likely put the last
>>> python 2 supporting release to last quarter of 2019 already.
>>>
>>> On Fri, Jun 7, 2019 at 2:15 AM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> I don't think the second release with robust/recommended Python 3
>>>> support should be the last release with Python 2 support--that is
>>>> simply not enough time for people to migrate. (Look at how long it
>>>> took us...) It does make a lot of sense to at least have one LTS
>>>> release with support for both.
>>>>
>>>> Regarding timeline, I think we could safely say we expect to support
>>>> Python 2 through 2019, likely for some of 2020 (possibly only via an
>>>> LTS release), and (very) unlikely beyond 2020.
>>>>
>>>> On Wed, Jun 5, 2019 at 6:34 PM Ahmet Altay <al...@google.com> wrote:
>>>> >
>>>> > I agree with the sentiment on this thread. Our priority needs to be
>>>> offering good python 3 support that we can comfortably recommend users to
>>>> switch. Progress on that so far has been promising and I do anticipate that
>>>> we will reach there in the near future.
>>>> >
>>>> > My proposal would be, once we reach to that state, we can mark the
>>>> first subsequent Beam release as the last Beam release that supports Python
>>>> 2. (Alternatively: in line with the previous experimental/deprecated
>>>> discussion we can make 2 more release with python 2 support rather than
>>>> just 1 more.) With the current state, we would not give users plenty of
>>>> time to upgrade python 3. So in addition, I would suggest we can consider
>>>> and upgrade relief by offering something like a 6-month support on the last
>>>> python 2 compatible release. We might do that in the context of an LTS
>>>> release.
>>>> >
>>>> > I do not believe we have a timeline we can share with users at this
>>>> point. However if we go with this suggestion, we will probably support
>>>> python 2 approximately until mid-2020.
>>>> >
>>>> > Ahmet
>>>> >
>>>> > On Wed, Jun 5, 2019 at 4:53 AM Tanay Tummalapalli <
>>>> ttanay...@gmail.com> wrote:
>>>> >>
>>>> >> We can support Python 2 for some time in 2020, but, we should target
>>>> a date no later than 2020 to drop support.
>>>> >> If we do plan to drop support for Python 2 in 2020, we should sign
>>>> the Python 3 statement[1], declaring that we will "drop support for Python
>>>> 2.7 no later than 2020".
>>>> >>
>>>> >> In addition to the statement, keeping a target release and date(if
>>>> possible) or timeline to drop support would also help users to decide when
>>>> they need to work on migrating to Python 3.
>>>> >>
>>>> >> Regards,
>>>> >> - TT
>>>> >>
>>>> >> [1] https://python3statement.org/
>>>> >>
>>>> >> On Wed, Jun 5, 2019 at 4:37 PM Robert Bradshaw <rober...@google.com>
>>>> wrote:
>>>> >>>
>>>> >>> Until Python 3 support for Beam is officially out of beta and
>>>> >>> recommended, I don't think we can tell people to stop using Python
>>>> 2.
>>>> >>> Given that 2020 is just over 6 months away, that seems a short
>>>> >>> transition time, so I would guess we'll have to continue supporting
>>>> >>> Python 2 sometime into 2020.
>>>> >>>
>>>> >>> A quick survey of users would be valuable here. But first priority
>>>> is
>>>> >>> making Python 3 rock solid so we can unconditionally recommend it
>>>> over
>>>> >>> Python 2.
>>>> >>>
>>>> >>> On Wed, Jun 5, 2019 at 12:27 PM Ismaël Mejía <ieme...@gmail.com>
>>>> wrote:
>>>> >>> >
>>>> >>> > Python 2 won't be maintained after 2020 [1]. I was wondering what
>>>> will
>>>> >>> > be our (Beam) plan for this. Other projects [2] have started to
>>>> alert
>>>> >>> > users that support will be removed so maybe we should decide or
>>>> policy
>>>> >>> > for this too.
>>>> >>> >
>>>> >>> > [1] https://pythonclock.org/
>>>> >>> > [2]
>>>> https://spark.apache.org/news/plan-for-dropping-python-2-support.html
>>>>
>>>

Reply via email to