Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

Yoshiki Obata Mon, 11 May 2020 10:05:19 -0700

Hello again,

Test infrastructure update is ongoing and then we should determine
which Python versions are high-priority.


According to Pypi downloads stats[1], download proportion of Python
3.5 is almost always greater than one of 3.6 and 3.7.
This situation has not changed since Robert told us Python 3.x
occupies nearly 40% of downloads[2]

On the other hand, according to docker hub[3],
apachebeam/python3.x_sdk image downloaded the most is one of Python
3.7 which was pointed by Kyle[4].

Considering these stats, I think high-priority versions are 3.5 and 3.7.

Is this assumption appropriate?
I would like to hear your thoughts about this.

[1] https://pypistats.org/packages/apache-beam
[2] 
https://lists.apache.org/thread.html/r208c0d11639e790453a17249e511dbfe00a09f91bef8fcd361b4b74a%40%3Cdev.beam.apache.org%3E
[3] https://hub.docker.com/search?q=apachebeam%2Fpython&type=image
[4] 
https://lists.apache.org/thread.html/r9ca9ad316dae3d60a3bf298eedbe4aeecab2b2664454cc352648abc9%40%3Cdev.beam.apache.org%3E

2020年5月6日(水) 12:48 Yoshiki Obata <[email protected]>:
>
> > Not sure how run_pylint.sh is related here - we should run linter on the 
> > entire codebase.
> ah, I mistyped... I meant run_pytest.sh
>
> > I am familiar with beam_PostCommit_PythonXX suites. Is there something 
> > specific about these suites that you wanted to know?
> Test suite runtime will depend on the number of  tests in the suite,
> how many tests we run in parallel, how long they take to run. To
> understand the load on test infrastructure we can monitor Beam test
> health metrics [1]. In particular, if time in queue[2] is high, it is
> a sign that there are not enough Jenkins slots available to start the
> test suite earlier.
> Sorry for ambiguous question. I wanted to know how to see the load on
> test infrastructure.
> The Grafana links you showed serves my purpose. Thank you.
>
> 2020年5月6日(水) 2:35 Valentyn Tymofieiev <[email protected]>:
> >
> > On Mon, May 4, 2020 at 7:06 PM Yoshiki Obata <[email protected]> 
> > wrote:
> >>
> >> Thank you for comment, Valentyn.
> >>
> >> > 1) We can seed the smoke test suite with typehints tests, and add more 
> >> > tests later if there is a need. We can identify them by the file path or 
> >> > by special attributes in test files. Identifying them using filepath 
> >> > seems simpler and independent of test runner.
> >>
> >> Yes, making run_pylint.sh allow target test file paths as arguments is
> >> good way if could.
> >
> >
> > Not sure how run_pylint.sh is related here - we should run linter on the 
> > entire codebase.
> >
> >>
> >> > 3)  We should reduce the code duplication across  
> >> > beam/sdks/python/test-suites/$runner/py3*. I think we could move the 
> >> > suite definition into a common file like 
> >> > beam/sdks/python/test-suites/$runner/build.gradle perhaps, and populate 
> >> > individual suites 
> >> > (beam/sdks/python/test-suites/$runner/py38/build.gradle) including the 
> >> > common file and/or logic from PythonNature [1].
> >>
> >> Exactly. I'll check it out.
> >>
> >> > 4) We have some tests that we run only under specific Python 3 versions, 
> >> > for example: FlinkValidatesRunner test runs using Python 3.5: [2]
> >> > HDFS Python 3 tests are running only with Python 3.7 [3]. Cross-language 
> >> > Py3 tests for Spark are running under Python 3.5[4]: , there may be more 
> >> > test suites that selectively use particular versions.
> >> > We need to correct such suites, so that we do not tie them  to a 
> >> > specific Python version. I see several options here: such tests should 
> >> > run either for all high-priority versions, or run only under the lowest 
> >> > version among the high-priority versions.  We don't have to fix them all 
> >> > at the same time. In general, we should try to make it as easy as 
> >> > possible to configure, whether a suite runs across all  versions, all 
> >> > high-priority versions, or just one version.
> >>
> >> The way of high-priority/low-priority configuration would be useful for 
> >> this.
> >> And which versions to be tested may be related to 5).
> >>
> >> > 5) If postcommit suites (that need to run against all versions) still 
> >> > constitute too much load on the infrastructure, we may need to 
> >> > investigate how to run these suites less frequently.
> >>
> >> That's certainly true, beam_PostCommit_PythonXX and
> >> beam_PostCommit_Python_Chicago_Taxi_(Dataflow|Flink) take about 1
> >> hour.
> >> Does anyone have knowledge about this?
> >
> >
> > I am familiar with beam_PostCommit_PythonXX suites. Is there something 
> > specific about these suites that you wanted to know?
> > Test suite runtime will depend on the number of  tests in the suite, how 
> > many tests we run in parallel, how long they take to run. To understand the 
> > load on test infrastructure we can monitor Beam test health metrics [1]. In 
> > particular, if time in queue[2] is high, it is a sign that there are not 
> > enough Jenkins slots available to start the test suite earlier.
> >
> > [1] http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability
> > [2] 
> > http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1&from=1588094891600&to=1588699691600&panelId=6&fullscreen
> >
> >
> >>
> >> 2020年5月2日(土) 5:18 Valentyn Tymofieiev <[email protected]>:
> >> >
> >> > Hi Yoshiki,
> >> >
> >> > Thanks a lot for your help with Python 3 support so far and most 
> >> > recently, with your work on Python 3.8.
> >> >
> >> > Overall the proposal sounds good to me. I see several aspects here that 
> >> > we need to address:
> >> >
> >> > 1) We can seed the smoke test suite with typehints tests, and add more 
> >> > tests later if there is a need. We can identify them by the file path or 
> >> > by special attributes in test files. Identifying them using filepath 
> >> > seems simpler and independent of test runner.
> >> >
> >> > 2) Defining high priority/low priority versions in gradle.properties 
> >> > sounds good to me.
> >> >
> >> > 3)  We should reduce the code duplication across  
> >> > beam/sdks/python/test-suites/$runner/py3*. I think we could move the 
> >> > suite definition into a common file like 
> >> > beam/sdks/python/test-suites/$runner/build.gradle perhaps, and populate 
> >> > individual suites 
> >> > (beam/sdks/python/test-suites/$runner/py38/build.gradle) including the 
> >> > common file and/or logic from PythonNature [1].
> >> >
> >> > 4) We have some tests that we run only under specific Python 3 versions, 
> >> > for example: FlinkValidatesRunner test runs using Python 3.5: [2]
> >> > HDFS Python 3 tests are running only with Python 3.7 [3]. Cross-language 
> >> > Py3 tests for Spark are running under Python 3.5[4]: , there may be more 
> >> > test suites that selectively use particular versions.
> >> >
> >> > We need to correct such suites, so that we do not tie them  to a 
> >> > specific Python version. I see several options here: such tests should 
> >> > run either for all high-priority versions, or run only under the lowest 
> >> > version among the high-priority versions.  We don't have to fix them all 
> >> > at the same time. In general, we should try to make it as easy as 
> >> > possible to configure, whether a suite runs across all  versions, all 
> >> > high-priority versions, or just one version.
> >> >
> >> > 5) If postcommit suites (that need to run against all versions) still 
> >> > constitute too much load on the infrastructure, we may need to 
> >> > investigate how to run these suites less frequently.
> >> >
> >> > [1] 
> >> > https://github.com/apache/beam/blob/b78c7ed4836e44177a149155581cfa8188e8f748/sdks/python/test-suites/portable/py37/build.gradle#L19-L20
> >> > [2] 
> >> > https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/.test-infra/jenkins/job_PostCommit_Python35_ValidatesRunner_Flink.groovy#L34
> >> > [3] 
> >> > https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/sdks/python/test-suites/direct/py37/build.gradle#L58
> >> > [4] 
> >> > https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark.groovy#L44
> >> >
> >> > On Fri, May 1, 2020 at 8:42 AM Yoshiki Obata <[email protected]> 
> >> > wrote:
> >> >>
> >> >> Hello everyone.
> >> >>
> >> >> I'm working on Python 3.8 support[1] and now is the time for preparing
> >> >> test infrastructure.
> >> >> According to the discussion, I've considered how to prioritize tests.
> >> >> My plan is as below. I'd like to get your thoughts on this.
> >> >>
> >> >> - With all low-pri Python, apache_beam.typehints.*_test run in the
> >> >> PreCommit test.
> >> >>   New gradle task should be defined like "preCommitPy3*-minimum".
> >> >>   If there are essential tests for all versions other than typehints,
> >> >> please point out.
> >> >>
> >> >> - With high-pri Python, the same tests as running in the current
> >> >> PreCommit test run for testing extensively; "tox:py3*:preCommitPy3*",
> >> >> "dataflow:py3*:preCommitIT" and "dataflow:py3*:preCommitIT_V2".
> >> >>
> >> >> - Low-pri versions' whole PreCommit tests are moved to each PostCommit 
> >> >> tests.
> >> >>
> >> >> - High-pri and low-pri versions are defined in gralde.properties and
> >> >> PreCommit/PostCommit task dependencies are built dynamically according
> >> >> to them.
> >> >>   It would be easy for switching priorities of Python versions.
> >> >>
> >> >> [1] https://issues.apache.org/jira/browse/BEAM-8494
> >> >>
> >> >> 2020年4月4日(土) 7:51 Robert Bradshaw <[email protected]>:
> >> >> >
> >> >> > https://pypistats.org/packages/apache-beam is an interesting data 
> >> >> > point.
> >> >> >
> >> >> > The good news: Python 3.x more than doubled to nearly 40% of 
> >> >> > downloads last month. Interestingly, it looks like a good chunk of 
> >> >> > this increase was 3.5 (which is now the most popular 3.x version by 
> >> >> > this metric...)
> >> >> >
> >> >> > I agree with using Python EOL dates as a baseline, with the 
> >> >> > possibility of case-by-case adjustments. Refactoring our tests to 
> >> >> > support 3.8 without increasing the load should be our focus now.
> >> >> >
> >> >> >
> >> >> > On Fri, Apr 3, 2020 at 3:41 PM Valentyn Tymofieiev 
> >> >> > <[email protected]> wrote:
> >> >> >>
> >> >> >> Some good news on  Python 3.x support: thanks to +David Song and 
> >> >> >> +Yifan Zou we now have Python 3.8 on Jenkins, and can start working 
> >> >> >> on adding Python 3.8 support to Beam (BEAM-8494).
> >> >> >>
> >> >> >>> One interesting variable that has not being mentioned is what 
> >> >> >>> versions of python 3
> >> >> >>> are available to users via their distribution channels (the linux
> >> >> >>> distributions they use to develop/run the pipelines).
> >> >> >>
> >> >> >>
> >> >> >> Good point. Looking at Ubuntu 16.04, which comes with Python 3.5.2, 
> >> >> >> we can see that  the end-of-life for 16.04 is in 2024, 
> >> >> >> end-of-support is April 2021 [1]. Both of these dates are beyond the 
> >> >> >> announced Python 3.5 EOL in September 2020 [2]. I think it would be 
> >> >> >> difficult for Beam to keep Py3.5 support until these EOL dates, and 
> >> >> >> users of systems that stock old versions of Python have viable 
> >> >> >> workarounds:
> >> >> >> - install a newer version of Python interpreter via pyenv[3], from 
> >> >> >> sources, or from alternative repositories.
> >> >> >> - use a docker container that comes with a newer version of 
> >> >> >> interpreter.
> >> >> >> - use older versions of Beam.
> >> >> >>
> >> >> >> We didn't receive feedback from user@ on how long 3.x versions on 
> >> >> >> the lower/higher end of the range should stay supported.  I would 
> >> >> >> suggest for now that we plan to support all Python 3.x versions that 
> >> >> >> were released and did not reach EOL. We can discuss exceptions to 
> >> >> >> this rule on a case-by-case basis, evaluating any maintenance burden 
> >> >> >> to continue support, or stop early.
> >> >> >>
> >> >> >> We should now focus on adjusting our Python test infrastructure to 
> >> >> >> make it easy to split 3.5, 3.6, 3.7, 3.8  suites into high-priority 
> >> >> >> and low-priority suites according to the Python version. Ideally, we 
> >> >> >> should make it easy to change which versions are high/low priority 
> >> >> >> without having to change all the individual test suites, and without 
> >> >> >> losing test coverage signal.
> >> >> >>
> >> >> >> [1] https://wiki.ubuntu.com/Releases
> >> >> >> [2] https://devguide.python.org/#status-of-python-branches
> >> >> >> [3] https://github.com/pyenv/pyenv/blob/master/README.md
> >> >> >>
> >> >> >> On Fri, Feb 28, 2020 at 1:25 AM Ismaël Mejía <[email protected]> 
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> One interesting variable that has not being mentioned is what 
> >> >> >>> versions of python
> >> >> >>> 3 are available to users via their distribution channels (the linux
> >> >> >>> distributions they use to develop/run the pipelines).
> >> >> >>>
> >> >> >>> - RHEL 8 users have python 3.6 available
> >> >> >>> - RHEL 7 users have python 3.6 available
> >> >> >>> - Debian 10/Ubuntu 18.04 users have python 3.7/3.6 available
> >> >> >>> - Debian 9/Ubuntu 16.04 users have python 3.5 available
> >> >> >>>
> >> >> >>>
> >> >> >>> We should consider this when we evaluate future support removals.
> >> >> >>>
> >> >> >>> Given  that the distros that support python 3.5 are ~4y old and 
> >> >> >>> since python 3.5
> >> >> >>> is also losing LTS support soon is probably ok to not support it in 
> >> >> >>> Beam
> >> >> >>> anymore as Robert suggests.
> >> >> >>>
> >> >> >>>
> >> >> >>> On Thu, Feb 27, 2020 at 3:57 AM Valentyn Tymofieiev 
> >> >> >>> <[email protected]> wrote:
> >> >> >>>>
> >> >> >>>> Thanks everyone for sharing your perspectives so far. It sounds 
> >> >> >>>> like we can mitigate the cost of test infrastructure by having:
> >> >> >>>> - a selection of (fast) tests that we will want to run against all 
> >> >> >>>> Python versions we support.
> >> >> >>>> - high priority Python versions, which we will test extensively.
> >> >> >>>> - infrequent postcommit test that exercise low-priority versions.
> >> >> >>>> We will need test infrastructure improvements to have the 
> >> >> >>>> flexibility of designating versions of high-pri/low-pri and 
> >> >> >>>> minimizing efforts requiring adopting a new version.
> >> >> >>>>
> >> >> >>>> There is still a question of how long we want to support old Py3.x 
> >> >> >>>> versions. As mentioned above, I think we should not support them 
> >> >> >>>> beyond EOL (5 years after a release). I wonder if that is still 
> >> >> >>>> too long. The cost of supporting a version may include:
> >> >> >>>>  - Developing against older Python version
> >> >> >>>>  - Release overhead (building & storing containers, wheels, doing 
> >> >> >>>> release validation)
> >> >> >>>>  - Complexity / development cost to support the quirks of the 
> >> >> >>>> minor versions.
> >> >> >>>>
> >> >> >>>> We can decide to drop support, after, say, 4 years, or after usage 
> >> >> >>>> drops below a threshold, or decide on a case-by-case basis. 
> >> >> >>>> Thoughts? Also asked for feedback on user@ [1]
> >> >> >>>>
> >> >> >>>> [1] 
> >> >> >>>> https://lists.apache.org/thread.html/r630a3b55aa8e75c68c8252ea6f824c3ab231ad56e18d916dfb84d9e8%40%3Cuser.beam.apache.org%3E
> >> >> >>>>
> >> >> >>>> On Wed, Feb 26, 2020 at 5:27 PM Robert Bradshaw 
> >> >> >>>> <[email protected]> wrote:
> >> >> >>>>>
> >> >> >>>>> On Wed, Feb 26, 2020 at 5:21 PM Valentyn Tymofieiev 
> >> >> >>>>> <[email protected]> wrote:
> >> >> >>>>> >
> >> >> >>>>> > > +1 to consulting users.
> >> >> >>>>> > I will message user@ as well and point to this thread.
> >> >> >>>>> >
> >> >> >>>>> > > I would propose getting in warnings about 3.5 EoL well ahead 
> >> >> >>>>> > > of time.
> >> >> >>>>> > I think we should document on our website, and  in the code 
> >> >> >>>>> > (warnings) that users should not expect SDKs to be supported in 
> >> >> >>>>> > Beam beyond the EOL. If we want to have flexibility to drop 
> >> >> >>>>> > support earlier than EOL, we need to be more careful with 
> >> >> >>>>> > messaging because users might otherwise expect that support 
> >> >> >>>>> > will last until EOL, if we mention EOL date.
> >> >> >>>>>
> >> >> >>>>> +1
> >> >> >>>>>
> >> >> >>>>> > I am hoping that we can establish a consensus for when we will 
> >> >> >>>>> > be dropping support for a version, so that we don't have to 
> >> >> >>>>> > discuss it on a case by case basis in the future.
> >> >> >>>>> >
> >> >> >>>>> > > I think it would makes sense to add support for 3.8 right 
> >> >> >>>>> > > away (or at least get a good sense of what work needs to be 
> >> >> >>>>> > > done and what our dependency situation is like)
> >> >> >>>>> > https://issues.apache.org/jira/browse/BEAM-8494 is a starting 
> >> >> >>>>> > point. I tried 3.8 a while ago some dependencies were not able 
> >> >> >>>>> > to install, checked again just now. SDK is "installable" after 
> >> >> >>>>> > minor changes. Some tests don't pass. BEAM-8494 does not have 
> >> >> >>>>> > an owner atm, and if anyone is interested I'm happy to give 
> >> >> >>>>> > further pointers and help get started.
> >> >> >>>>> >
> >> >> >>>>> > > For the 3.x series, I think we will get the most signal out 
> >> >> >>>>> > > of the lowest and highest version, and can get by with smoke 
> >> >> >>>>> > > tests +
> >> >> >>>>> > infrequent post-commits for the ones between.
> >> >> >>>>> >
> >> >> >>>>> > > I agree with having low-frequency tests for low-priority 
> >> >> >>>>> > > versions. Low-priority versions could be determined according 
> >> >> >>>>> > > to least usage.
> >> >> >>>>> >
> >> >> >>>>> > These are good ideas. Do you think we will want to have an 
> >> >> >>>>> > ability  to run some (inexpensive) tests for all versions  
> >> >> >>>>> > frequently (on presubmits), or this is extra complexity that 
> >> >> >>>>> > can be avoided? I am thinking about type inference for example. 
> >> >> >>>>> > Afaik inference logic is very sensitive to the version. Would 
> >> >> >>>>> > it be acceptable to catch  errors there in infrequent 
> >> >> >>>>> > postcommits or an early signal will be preferred?
> >> >> >>>>>
> >> >> >>>>> This is a good example--the type inference tests are sensitive to
> >> >> >>>>> version (due to using internal details and relying on the
> >> >> >>>>> still-evolving typing module) but also run in ~15 seconds. I think
> >> >> >>>>> these should be in precommits. We just don't need to run every 
> >> >> >>>>> test
> >> >> >>>>> for every version.
> >> >> >>>>>
> >> >> >>>>> > On Wed, Feb 26, 2020 at 5:17 PM Kyle Weaver 
> >> >> >>>>> > <[email protected]> wrote:
> >> >> >>>>> >>
> >> >> >>>>> >> Oh, I didn't see Robert's earlier email:
> >> >> >>>>> >>
> >> >> >>>>> >> > Currently 3.5 downloads sit at 3.7%, or about
> >> >> >>>>> >> > 20% of all Python 3 downloads.
> >> >> >>>>> >>
> >> >> >>>>> >> Where did these numbers come from?
> >> >> >>>>> >>
> >> >> >>>>> >> On Wed, Feb 26, 2020 at 5:15 PM Kyle Weaver 
> >> >> >>>>> >> <[email protected]> wrote:
> >> >> >>>>> >>>
> >> >> >>>>> >>> > I agree with having low-frequency tests for low-priority 
> >> >> >>>>> >>> > versions.
> >> >> >>>>> >>> > Low-priority versions could be determined according to 
> >> >> >>>>> >>> > least usage.
> >> >> >>>>> >>>
> >> >> >>>>> >>> +1. While the difference may not be as great between, say, 
> >> >> >>>>> >>> 3.6 and 3.7, I think that if we had to choose, it would be 
> >> >> >>>>> >>> more useful to test the versions folks are actually using the 
> >> >> >>>>> >>> most. 3.5 only has about a third of the Docker pulls of 3.6 
> >> >> >>>>> >>> or 3.7 [1]. Does anyone have other usage statistics we can 
> >> >> >>>>> >>> consult?
> >> >> >>>>> >>>
> >> >> >>>>> >>> [1] 
> >> >> >>>>> >>> https://hub.docker.com/search?q=apachebeam%2Fpython&type=image
> >> >> >>>>> >>>
> >> >> >>>>> >>> On Wed, Feb 26, 2020 at 5:00 PM Ruoyun Huang 
> >> >> >>>>> >>> <[email protected]> wrote:
> >> >> >>>>> >>>>
> >> >> >>>>> >>>> I feel 4+ versions take too long to run anything.
> >> >> >>>>> >>>>
> >> >> >>>>> >>>> would vote for lowest + highest,  2 versions.
> >> >> >>>>> >>>>
> >> >> >>>>> >>>> On Wed, Feb 26, 2020 at 4:52 PM Udi Meiri <[email protected]> 
> >> >> >>>>> >>>> wrote:
> >> >> >>>>> >>>>>
> >> >> >>>>> >>>>> I agree with having low-frequency tests for low-priority 
> >> >> >>>>> >>>>> versions.
> >> >> >>>>> >>>>> Low-priority versions could be determined according to 
> >> >> >>>>> >>>>> least usage.
> >> >> >>>>> >>>>>
> >> >> >>>>> >>>>>
> >> >> >>>>> >>>>>
> >> >> >>>>> >>>>> On Wed, Feb 26, 2020 at 4:06 PM Robert Bradshaw 
> >> >> >>>>> >>>>> <[email protected]> wrote:
> >> >> >>>>> >>>>>>
> >> >> >>>>> >>>>>> On Wed, Feb 26, 2020 at 3:29 PM Kenneth Knowles 
> >> >> >>>>> >>>>>> <[email protected]> wrote:
> >> >> >>>>> >>>>>> >
> >> >> >>>>> >>>>>> > Are these divergent enough that they all need to consume 
> >> >> >>>>> >>>>>> > testing resources? For example can lower priority 
> >> >> >>>>> >>>>>> > versions be daily runs or some such?
> >> >> >>>>> >>>>>>
> >> >> >>>>> >>>>>> For the 3.x series, I think we will get the most signal 
> >> >> >>>>> >>>>>> out of the
> >> >> >>>>> >>>>>> lowest and highest version, and can get by with smoke 
> >> >> >>>>> >>>>>> tests +
> >> >> >>>>> >>>>>> infrequent post-commits for the ones between.
> >> >> >>>>> >>>>>>
> >> >> >>>>> >>>>>> > Kenn
> >> >> >>>>> >>>>>> >
> >> >> >>>>> >>>>>> > On Wed, Feb 26, 2020 at 3:25 PM Robert Bradshaw 
> >> >> >>>>> >>>>>> > <[email protected]> wrote:
> >> >> >>>>> >>>>>> >>
> >> >> >>>>> >>>>>> >> +1 to consulting users. Currently 3.5 downloads sit at 
> >> >> >>>>> >>>>>> >> 3.7%, or about
> >> >> >>>>> >>>>>> >> 20% of all Python 3 downloads.
> >> >> >>>>> >>>>>> >>
> >> >> >>>>> >>>>>> >> I would propose getting in warnings about 3.5 EoL well 
> >> >> >>>>> >>>>>> >> ahead of time,
> >> >> >>>>> >>>>>> >> at the very least as part of the 2.7 warning.
> >> >> >>>>> >>>>>> >>
> >> >> >>>>> >>>>>> >> Fortunately, supporting multiple 3.x versions is 
> >> >> >>>>> >>>>>> >> significantly easier
> >> >> >>>>> >>>>>> >> than spanning 2.7 and 3.x. I would rather not impose an 
> >> >> >>>>> >>>>>> >> ordering on
> >> >> >>>>> >>>>>> >> dropping 3.5 and adding 3.8 but consider their merits 
> >> >> >>>>> >>>>>> >> independently.
> >> >> >>>>> >>>>>> >>
> >> >> >>>>> >>>>>> >>
> >> >> >>>>> >>>>>> >> On Wed, Feb 26, 2020 at 3:16 PM Kyle Weaver 
> >> >> >>>>> >>>>>> >> <[email protected]> wrote:
> >> >> >>>>> >>>>>> >> >
> >> >> >>>>> >>>>>> >> > 5 versions is too many IMO. We've had issues with 
> >> >> >>>>> >>>>>> >> > Python precommit resource usage in the past, and 
> >> >> >>>>> >>>>>> >> > adding another version would surely exacerbate those 
> >> >> >>>>> >>>>>> >> > issues. And we have also already had to leave out 
> >> >> >>>>> >>>>>> >> > certain features on 3.5 [1]. Therefore, I am in favor 
> >> >> >>>>> >>>>>> >> > of dropping 3.5 before adding 3.8. After dropping 
> >> >> >>>>> >>>>>> >> > Python 2 and adding 3.8, that will leave us with the 
> >> >> >>>>> >>>>>> >> > latest three minor versions (3.6, 3.7, 3.8), which I 
> >> >> >>>>> >>>>>> >> > think is closer to the "sweet spot." Though I would 
> >> >> >>>>> >>>>>> >> > be interested in hearing if there are any users who 
> >> >> >>>>> >>>>>> >> > would prefer we continue supporting 3.5.
> >> >> >>>>> >>>>>> >> >
> >> >> >>>>> >>>>>> >> > [1] 
> >> >> >>>>> >>>>>> >> > https://github.com/apache/beam/blob/8658b95545352e51f35959f38334f3c7df8b48eb/sdks/python/apache_beam/runners/portability/flink_runner.py#L55
> >> >> >>>>> >>>>>> >> >
> >> >> >>>>> >>>>>> >> > On Wed, Feb 26, 2020 at 3:00 PM Valentyn Tymofieiev 
> >> >> >>>>> >>>>>> >> > <[email protected]> wrote:
> >> >> >>>>> >>>>>> >> >>
> >> >> >>>>> >>>>>> >> >> I would like to start a discussion about identifying 
> >> >> >>>>> >>>>>> >> >> a guideline for answering questions like:
> >> >> >>>>> >>>>>> >> >>
> >> >> >>>>> >>>>>> >> >> 1. When will Beam support a new Python version (say, 
> >> >> >>>>> >>>>>> >> >> Python 3.8)?
> >> >> >>>>> >>>>>> >> >> 2. When will Beam drop support for an old Python 
> >> >> >>>>> >>>>>> >> >> version (say, Python 3.5)?
> >> >> >>>>> >>>>>> >> >> 3. How many Python versions should we aim to support 
> >> >> >>>>> >>>>>> >> >> concurrently (investigate issues, have continuous 
> >> >> >>>>> >>>>>> >> >> integration tests)?
> >> >> >>>>> >>>>>> >> >> 4. What comes first: adding support for a new 
> >> >> >>>>> >>>>>> >> >> version (3.8) or deprecating older one (3.5)? This 
> >> >> >>>>> >>>>>> >> >> may affect the max load our test infrastructure 
> >> >> >>>>> >>>>>> >> >> needs to sustain.
> >> >> >>>>> >>>>>> >> >>
> >> >> >>>>> >>>>>> >> >> We are already getting requests for supporting 
> >> >> >>>>> >>>>>> >> >> Python 3.8 and there were some good reasons[1] to 
> >> >> >>>>> >>>>>> >> >> drop support for Python 3.5 (at least, early 
> >> >> >>>>> >>>>>> >> >> versions of 3.5). Answering these questions would 
> >> >> >>>>> >>>>>> >> >> help set expectations in Beam user community, Beam 
> >> >> >>>>> >>>>>> >> >> dev community, and  may help us establish resource 
> >> >> >>>>> >>>>>> >> >> requirements for test infrastructure and plan 
> >> >> >>>>> >>>>>> >> >> efforts.
> >> >> >>>>> >>>>>> >> >>
> >> >> >>>>> >>>>>> >> >> PEP-0602 [2] establishes a yearly release cycle for 
> >> >> >>>>> >>>>>> >> >> Python versions starting from 3.9. Each release is a 
> >> >> >>>>> >>>>>> >> >> long-term support release and is supported for 5 
> >> >> >>>>> >>>>>> >> >> years: first 1.5 years allow for general bug fix 
> >> >> >>>>> >>>>>> >> >> support, remaining 3.5 years have security fix 
> >> >> >>>>> >>>>>> >> >> support.
> >> >> >>>>> >>>>>> >> >>
> >> >> >>>>> >>>>>> >> >> At every point, there may be up to 5 Python minor 
> >> >> >>>>> >>>>>> >> >> versions that did not yet reach EOL, see "Release 
> >> >> >>>>> >>>>>> >> >> overlap with 12 month diagram" [3]. We can try to 
> >> >> >>>>> >>>>>> >> >> support all of them, but that may come at a cost of 
> >> >> >>>>> >>>>>> >> >> velocity: we will have more tests to maintain, and 
> >> >> >>>>> >>>>>> >> >> we will have to develop Beam against a lower version 
> >> >> >>>>> >>>>>> >> >> for a longer period. Supporting less versions will 
> >> >> >>>>> >>>>>> >> >> have implications for user experience. It also may 
> >> >> >>>>> >>>>>> >> >> be difficult to ensure support of the most recent 
> >> >> >>>>> >>>>>> >> >> version early, since our  dependencies (e.g. 
> >> >> >>>>> >>>>>> >> >> picklers) may not be supporting them yet.
> >> >> >>>>> >>>>>> >> >>
> >> >> >>>>> >>>>>> >> >> Currently we support 4 Python versions (2.7, 3.5, 
> >> >> >>>>> >>>>>> >> >> 3.6, 3.7).
> >> >> >>>>> >>>>>> >> >>
> >> >> >>>>> >>>>>> >> >> Is 4 versions a sweet spot? Too much? Too little? 
> >> >> >>>>> >>>>>> >> >> What do you think?
> >> >> >>>>> >>>>>> >> >>
> >> >> >>>>> >>>>>> >> >> [1] 
> >> >> >>>>> >>>>>> >> >> https://github.com/apache/beam/pull/10821#issuecomment-590167711
> >> >> >>>>> >>>>>> >> >> [2] https://www.python.org/dev/peps/pep-0602/
> >> >> >>>>> >>>>>> >> >> [3] https://www.python.org/dev/peps/pep-0602/#id17

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

Reply via email to