Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

Yoshiki Obata Mon, 04 May 2020 19:07:09 -0700

Thank you for comment, Valentyn.

> 1) We can seed the smoke test suite with typehints tests, and add more tests 
> later if there is a need. We can identify them by the file path or by special 
> attributes in test files. Identifying them using filepath seems simpler and 
> independent of test runner.


Yes, making run_pylint.sh allow target test file paths as arguments is
good way if could.

> 3)  We should reduce the code duplication across  
> beam/sdks/python/test-suites/$runner/py3*. I think we could move the suite 
> definition into a common file like 
> beam/sdks/python/test-suites/$runner/build.gradle perhaps, and populate 
> individual suites (beam/sdks/python/test-suites/$runner/py38/build.gradle) 
> including the common file and/or logic from PythonNature [1].

Exactly. I'll check it out.

> 4) We have some tests that we run only under specific Python 3 versions, for 
> example: FlinkValidatesRunner test runs using Python 3.5: [2]
> HDFS Python 3 tests are running only with Python 3.7 [3]. Cross-language Py3 
> tests for Spark are running under Python 3.5[4]: , there may be more test 
> suites that selectively use particular versions.
> We need to correct such suites, so that we do not tie them  to a specific 
> Python version. I see several options here: such tests should run either for 
> all high-priority versions, or run only under the lowest version among the 
> high-priority versions.  We don't have to fix them all at the same time. In 
> general, we should try to make it as easy as possible to configure, whether a 
> suite runs across all  versions, all high-priority versions, or just one 
> version.

The way of high-priority/low-priority configuration would be useful for this.
And which versions to be tested may be related to 5).

> 5) If postcommit suites (that need to run against all versions) still 
> constitute too much load on the infrastructure, we may need to investigate 
> how to run these suites less frequently.

That's certainly true, beam_PostCommit_PythonXX and
beam_PostCommit_Python_Chicago_Taxi_(Dataflow|Flink) take about 1
hour.
Does anyone have knowledge about this?

2020年5月2日(土) 5:18 Valentyn Tymofieiev <valen...@google.com>:
>
> Hi Yoshiki,
>
> Thanks a lot for your help with Python 3 support so far and most recently, 
> with your work on Python 3.8.
>
> Overall the proposal sounds good to me. I see several aspects here that we 
> need to address:
>
> 1) We can seed the smoke test suite with typehints tests, and add more tests 
> later if there is a need. We can identify them by the file path or by special 
> attributes in test files. Identifying them using filepath seems simpler and 
> independent of test runner.
>
> 2) Defining high priority/low priority versions in gradle.properties sounds 
> good to me.
>
> 3)  We should reduce the code duplication across  
> beam/sdks/python/test-suites/$runner/py3*. I think we could move the suite 
> definition into a common file like 
> beam/sdks/python/test-suites/$runner/build.gradle perhaps, and populate 
> individual suites (beam/sdks/python/test-suites/$runner/py38/build.gradle) 
> including the common file and/or logic from PythonNature [1].
>
> 4) We have some tests that we run only under specific Python 3 versions, for 
> example: FlinkValidatesRunner test runs using Python 3.5: [2]
> HDFS Python 3 tests are running only with Python 3.7 [3]. Cross-language Py3 
> tests for Spark are running under Python 3.5[4]: , there may be more test 
> suites that selectively use particular versions.
>
> We need to correct such suites, so that we do not tie them  to a specific 
> Python version. I see several options here: such tests should run either for 
> all high-priority versions, or run only under the lowest version among the 
> high-priority versions.  We don't have to fix them all at the same time. In 
> general, we should try to make it as easy as possible to configure, whether a 
> suite runs across all  versions, all high-priority versions, or just one 
> version.
>
> 5) If postcommit suites (that need to run against all versions) still 
> constitute too much load on the infrastructure, we may need to investigate 
> how to run these suites less frequently.
>
> [1] 
> https://github.com/apache/beam/blob/b78c7ed4836e44177a149155581cfa8188e8f748/sdks/python/test-suites/portable/py37/build.gradle#L19-L20
> [2] 
> https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/.test-infra/jenkins/job_PostCommit_Python35_ValidatesRunner_Flink.groovy#L34
> [3] 
> https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/sdks/python/test-suites/direct/py37/build.gradle#L58
> [4] 
> https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark.groovy#L44
>
> On Fri, May 1, 2020 at 8:42 AM Yoshiki Obata <yoshiki.ob...@gmail.com> wrote:
>>
>> Hello everyone.
>>
>> I'm working on Python 3.8 support[1] and now is the time for preparing
>> test infrastructure.
>> According to the discussion, I've considered how to prioritize tests.
>> My plan is as below. I'd like to get your thoughts on this.
>>
>> - With all low-pri Python, apache_beam.typehints.*_test run in the
>> PreCommit test.
>>   New gradle task should be defined like "preCommitPy3*-minimum".
>>   If there are essential tests for all versions other than typehints,
>> please point out.
>>
>> - With high-pri Python, the same tests as running in the current
>> PreCommit test run for testing extensively; "tox:py3*:preCommitPy3*",
>> "dataflow:py3*:preCommitIT" and "dataflow:py3*:preCommitIT_V2".
>>
>> - Low-pri versions' whole PreCommit tests are moved to each PostCommit tests.
>>
>> - High-pri and low-pri versions are defined in gralde.properties and
>> PreCommit/PostCommit task dependencies are built dynamically according
>> to them.
>>   It would be easy for switching priorities of Python versions.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-8494
>>
>> 2020年4月4日(土) 7:51 Robert Bradshaw <rober...@google.com>:
>> >
>> > https://pypistats.org/packages/apache-beam is an interesting data point.
>> >
>> > The good news: Python 3.x more than doubled to nearly 40% of downloads 
>> > last month. Interestingly, it looks like a good chunk of this increase was 
>> > 3.5 (which is now the most popular 3.x version by this metric...)
>> >
>> > I agree with using Python EOL dates as a baseline, with the possibility of 
>> > case-by-case adjustments. Refactoring our tests to support 3.8 without 
>> > increasing the load should be our focus now.
>> >
>> >
>> > On Fri, Apr 3, 2020 at 3:41 PM Valentyn Tymofieiev <valen...@google.com> 
>> > wrote:
>> >>
>> >> Some good news on  Python 3.x support: thanks to +David Song and +Yifan 
>> >> Zou we now have Python 3.8 on Jenkins, and can start working on adding 
>> >> Python 3.8 support to Beam (BEAM-8494).
>> >>
>> >>> One interesting variable that has not being mentioned is what versions 
>> >>> of python 3
>> >>> are available to users via their distribution channels (the linux
>> >>> distributions they use to develop/run the pipelines).
>> >>
>> >>
>> >> Good point. Looking at Ubuntu 16.04, which comes with Python 3.5.2, we 
>> >> can see that  the end-of-life for 16.04 is in 2024, end-of-support is 
>> >> April 2021 [1]. Both of these dates are beyond the announced Python 3.5 
>> >> EOL in September 2020 [2]. I think it would be difficult for Beam to keep 
>> >> Py3.5 support until these EOL dates, and users of systems that stock old 
>> >> versions of Python have viable workarounds:
>> >> - install a newer version of Python interpreter via pyenv[3], from 
>> >> sources, or from alternative repositories.
>> >> - use a docker container that comes with a newer version of interpreter.
>> >> - use older versions of Beam.
>> >>
>> >> We didn't receive feedback from user@ on how long 3.x versions on the 
>> >> lower/higher end of the range should stay supported.  I would suggest for 
>> >> now that we plan to support all Python 3.x versions that were released 
>> >> and did not reach EOL. We can discuss exceptions to this rule on a 
>> >> case-by-case basis, evaluating any maintenance burden to continue 
>> >> support, or stop early.
>> >>
>> >> We should now focus on adjusting our Python test infrastructure to make 
>> >> it easy to split 3.5, 3.6, 3.7, 3.8  suites into high-priority and 
>> >> low-priority suites according to the Python version. Ideally, we should 
>> >> make it easy to change which versions are high/low priority without 
>> >> having to change all the individual test suites, and without losing test 
>> >> coverage signal.
>> >>
>> >> [1] https://wiki.ubuntu.com/Releases
>> >> [2] https://devguide.python.org/#status-of-python-branches
>> >> [3] https://github.com/pyenv/pyenv/blob/master/README.md
>> >>
>> >> On Fri, Feb 28, 2020 at 1:25 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>> >>>
>> >>> One interesting variable that has not being mentioned is what versions 
>> >>> of python
>> >>> 3 are available to users via their distribution channels (the linux
>> >>> distributions they use to develop/run the pipelines).
>> >>>
>> >>> - RHEL 8 users have python 3.6 available
>> >>> - RHEL 7 users have python 3.6 available
>> >>> - Debian 10/Ubuntu 18.04 users have python 3.7/3.6 available
>> >>> - Debian 9/Ubuntu 16.04 users have python 3.5 available
>> >>>
>> >>>
>> >>> We should consider this when we evaluate future support removals.
>> >>>
>> >>> Given  that the distros that support python 3.5 are ~4y old and since 
>> >>> python 3.5
>> >>> is also losing LTS support soon is probably ok to not support it in Beam
>> >>> anymore as Robert suggests.
>> >>>
>> >>>
>> >>> On Thu, Feb 27, 2020 at 3:57 AM Valentyn Tymofieiev 
>> >>> <valen...@google.com> wrote:
>> >>>>
>> >>>> Thanks everyone for sharing your perspectives so far. It sounds like we 
>> >>>> can mitigate the cost of test infrastructure by having:
>> >>>> - a selection of (fast) tests that we will want to run against all 
>> >>>> Python versions we support.
>> >>>> - high priority Python versions, which we will test extensively.
>> >>>> - infrequent postcommit test that exercise low-priority versions.
>> >>>> We will need test infrastructure improvements to have the flexibility 
>> >>>> of designating versions of high-pri/low-pri and minimizing efforts 
>> >>>> requiring adopting a new version.
>> >>>>
>> >>>> There is still a question of how long we want to support old Py3.x 
>> >>>> versions. As mentioned above, I think we should not support them beyond 
>> >>>> EOL (5 years after a release). I wonder if that is still too long. The 
>> >>>> cost of supporting a version may include:
>> >>>>  - Developing against older Python version
>> >>>>  - Release overhead (building & storing containers, wheels, doing 
>> >>>> release validation)
>> >>>>  - Complexity / development cost to support the quirks of the minor 
>> >>>> versions.
>> >>>>
>> >>>> We can decide to drop support, after, say, 4 years, or after usage 
>> >>>> drops below a threshold, or decide on a case-by-case basis. Thoughts? 
>> >>>> Also asked for feedback on user@ [1]
>> >>>>
>> >>>> [1] 
>> >>>> https://lists.apache.org/thread.html/r630a3b55aa8e75c68c8252ea6f824c3ab231ad56e18d916dfb84d9e8%40%3Cuser.beam.apache.org%3E
>> >>>>
>> >>>> On Wed, Feb 26, 2020 at 5:27 PM Robert Bradshaw <rober...@google.com> 
>> >>>> wrote:
>> >>>>>
>> >>>>> On Wed, Feb 26, 2020 at 5:21 PM Valentyn Tymofieiev 
>> >>>>> <valen...@google.com> wrote:
>> >>>>> >
>> >>>>> > > +1 to consulting users.
>> >>>>> > I will message user@ as well and point to this thread.
>> >>>>> >
>> >>>>> > > I would propose getting in warnings about 3.5 EoL well ahead of 
>> >>>>> > > time.
>> >>>>> > I think we should document on our website, and  in the code 
>> >>>>> > (warnings) that users should not expect SDKs to be supported in Beam 
>> >>>>> > beyond the EOL. If we want to have flexibility to drop support 
>> >>>>> > earlier than EOL, we need to be more careful with messaging because 
>> >>>>> > users might otherwise expect that support will last until EOL, if we 
>> >>>>> > mention EOL date.
>> >>>>>
>> >>>>> +1
>> >>>>>
>> >>>>> > I am hoping that we can establish a consensus for when we will be 
>> >>>>> > dropping support for a version, so that we don't have to discuss it 
>> >>>>> > on a case by case basis in the future.
>> >>>>> >
>> >>>>> > > I think it would makes sense to add support for 3.8 right away (or 
>> >>>>> > > at least get a good sense of what work needs to be done and what 
>> >>>>> > > our dependency situation is like)
>> >>>>> > https://issues.apache.org/jira/browse/BEAM-8494 is a starting point. 
>> >>>>> > I tried 3.8 a while ago some dependencies were not able to install, 
>> >>>>> > checked again just now. SDK is "installable" after minor changes. 
>> >>>>> > Some tests don't pass. BEAM-8494 does not have an owner atm, and if 
>> >>>>> > anyone is interested I'm happy to give further pointers and help get 
>> >>>>> > started.
>> >>>>> >
>> >>>>> > > For the 3.x series, I think we will get the most signal out of the 
>> >>>>> > > lowest and highest version, and can get by with smoke tests +
>> >>>>> > infrequent post-commits for the ones between.
>> >>>>> >
>> >>>>> > > I agree with having low-frequency tests for low-priority versions. 
>> >>>>> > > Low-priority versions could be determined according to least usage.
>> >>>>> >
>> >>>>> > These are good ideas. Do you think we will want to have an ability  
>> >>>>> > to run some (inexpensive) tests for all versions  frequently (on 
>> >>>>> > presubmits), or this is extra complexity that can be avoided? I am 
>> >>>>> > thinking about type inference for example. Afaik inference logic is 
>> >>>>> > very sensitive to the version. Would it be acceptable to catch  
>> >>>>> > errors there in infrequent postcommits or an early signal will be 
>> >>>>> > preferred?
>> >>>>>
>> >>>>> This is a good example--the type inference tests are sensitive to
>> >>>>> version (due to using internal details and relying on the
>> >>>>> still-evolving typing module) but also run in ~15 seconds. I think
>> >>>>> these should be in precommits. We just don't need to run every test
>> >>>>> for every version.
>> >>>>>
>> >>>>> > On Wed, Feb 26, 2020 at 5:17 PM Kyle Weaver <kcwea...@google.com> 
>> >>>>> > wrote:
>> >>>>> >>
>> >>>>> >> Oh, I didn't see Robert's earlier email:
>> >>>>> >>
>> >>>>> >> > Currently 3.5 downloads sit at 3.7%, or about
>> >>>>> >> > 20% of all Python 3 downloads.
>> >>>>> >>
>> >>>>> >> Where did these numbers come from?
>> >>>>> >>
>> >>>>> >> On Wed, Feb 26, 2020 at 5:15 PM Kyle Weaver <kcwea...@google.com> 
>> >>>>> >> wrote:
>> >>>>> >>>
>> >>>>> >>> > I agree with having low-frequency tests for low-priority 
>> >>>>> >>> > versions.
>> >>>>> >>> > Low-priority versions could be determined according to least 
>> >>>>> >>> > usage.
>> >>>>> >>>
>> >>>>> >>> +1. While the difference may not be as great between, say, 3.6 and 
>> >>>>> >>> 3.7, I think that if we had to choose, it would be more useful to 
>> >>>>> >>> test the versions folks are actually using the most. 3.5 only has 
>> >>>>> >>> about a third of the Docker pulls of 3.6 or 3.7 [1]. Does anyone 
>> >>>>> >>> have other usage statistics we can consult?
>> >>>>> >>>
>> >>>>> >>> [1] https://hub.docker.com/search?q=apachebeam%2Fpython&type=image
>> >>>>> >>>
>> >>>>> >>> On Wed, Feb 26, 2020 at 5:00 PM Ruoyun Huang <ruo...@google.com> 
>> >>>>> >>> wrote:
>> >>>>> >>>>
>> >>>>> >>>> I feel 4+ versions take too long to run anything.
>> >>>>> >>>>
>> >>>>> >>>> would vote for lowest + highest,  2 versions.
>> >>>>> >>>>
>> >>>>> >>>> On Wed, Feb 26, 2020 at 4:52 PM Udi Meiri <eh...@google.com> 
>> >>>>> >>>> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> I agree with having low-frequency tests for low-priority 
>> >>>>> >>>>> versions.
>> >>>>> >>>>> Low-priority versions could be determined according to least 
>> >>>>> >>>>> usage.
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> On Wed, Feb 26, 2020 at 4:06 PM Robert Bradshaw 
>> >>>>> >>>>> <rober...@google.com> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> On Wed, Feb 26, 2020 at 3:29 PM Kenneth Knowles 
>> >>>>> >>>>>> <k...@apache.org> wrote:
>> >>>>> >>>>>> >
>> >>>>> >>>>>> > Are these divergent enough that they all need to consume 
>> >>>>> >>>>>> > testing resources? For example can lower priority versions be 
>> >>>>> >>>>>> > daily runs or some such?
>> >>>>> >>>>>>
>> >>>>> >>>>>> For the 3.x series, I think we will get the most signal out of 
>> >>>>> >>>>>> the
>> >>>>> >>>>>> lowest and highest version, and can get by with smoke tests +
>> >>>>> >>>>>> infrequent post-commits for the ones between.
>> >>>>> >>>>>>
>> >>>>> >>>>>> > Kenn
>> >>>>> >>>>>> >
>> >>>>> >>>>>> > On Wed, Feb 26, 2020 at 3:25 PM Robert Bradshaw 
>> >>>>> >>>>>> > <rober...@google.com> wrote:
>> >>>>> >>>>>> >>
>> >>>>> >>>>>> >> +1 to consulting users. Currently 3.5 downloads sit at 3.7%, 
>> >>>>> >>>>>> >> or about
>> >>>>> >>>>>> >> 20% of all Python 3 downloads.
>> >>>>> >>>>>> >>
>> >>>>> >>>>>> >> I would propose getting in warnings about 3.5 EoL well ahead 
>> >>>>> >>>>>> >> of time,
>> >>>>> >>>>>> >> at the very least as part of the 2.7 warning.
>> >>>>> >>>>>> >>
>> >>>>> >>>>>> >> Fortunately, supporting multiple 3.x versions is 
>> >>>>> >>>>>> >> significantly easier
>> >>>>> >>>>>> >> than spanning 2.7 and 3.x. I would rather not impose an 
>> >>>>> >>>>>> >> ordering on
>> >>>>> >>>>>> >> dropping 3.5 and adding 3.8 but consider their merits 
>> >>>>> >>>>>> >> independently.
>> >>>>> >>>>>> >>
>> >>>>> >>>>>> >>
>> >>>>> >>>>>> >> On Wed, Feb 26, 2020 at 3:16 PM Kyle Weaver 
>> >>>>> >>>>>> >> <kcwea...@google.com> wrote:
>> >>>>> >>>>>> >> >
>> >>>>> >>>>>> >> > 5 versions is too many IMO. We've had issues with Python 
>> >>>>> >>>>>> >> > precommit resource usage in the past, and adding another 
>> >>>>> >>>>>> >> > version would surely exacerbate those issues. And we have 
>> >>>>> >>>>>> >> > also already had to leave out certain features on 3.5 [1]. 
>> >>>>> >>>>>> >> > Therefore, I am in favor of dropping 3.5 before adding 
>> >>>>> >>>>>> >> > 3.8. After dropping Python 2 and adding 3.8, that will 
>> >>>>> >>>>>> >> > leave us with the latest three minor versions (3.6, 3.7, 
>> >>>>> >>>>>> >> > 3.8), which I think is closer to the "sweet spot." Though 
>> >>>>> >>>>>> >> > I would be interested in hearing if there are any users 
>> >>>>> >>>>>> >> > who would prefer we continue supporting 3.5.
>> >>>>> >>>>>> >> >
>> >>>>> >>>>>> >> > [1] 
>> >>>>> >>>>>> >> > https://github.com/apache/beam/blob/8658b95545352e51f35959f38334f3c7df8b48eb/sdks/python/apache_beam/runners/portability/flink_runner.py#L55
>> >>>>> >>>>>> >> >
>> >>>>> >>>>>> >> > On Wed, Feb 26, 2020 at 3:00 PM Valentyn Tymofieiev 
>> >>>>> >>>>>> >> > <valen...@google.com> wrote:
>> >>>>> >>>>>> >> >>
>> >>>>> >>>>>> >> >> I would like to start a discussion about identifying a 
>> >>>>> >>>>>> >> >> guideline for answering questions like:
>> >>>>> >>>>>> >> >>
>> >>>>> >>>>>> >> >> 1. When will Beam support a new Python version (say, 
>> >>>>> >>>>>> >> >> Python 3.8)?
>> >>>>> >>>>>> >> >> 2. When will Beam drop support for an old Python version 
>> >>>>> >>>>>> >> >> (say, Python 3.5)?
>> >>>>> >>>>>> >> >> 3. How many Python versions should we aim to support 
>> >>>>> >>>>>> >> >> concurrently (investigate issues, have continuous 
>> >>>>> >>>>>> >> >> integration tests)?
>> >>>>> >>>>>> >> >> 4. What comes first: adding support for a new version 
>> >>>>> >>>>>> >> >> (3.8) or deprecating older one (3.5)? This may affect the 
>> >>>>> >>>>>> >> >> max load our test infrastructure needs to sustain.
>> >>>>> >>>>>> >> >>
>> >>>>> >>>>>> >> >> We are already getting requests for supporting Python 3.8 
>> >>>>> >>>>>> >> >> and there were some good reasons[1] to drop support for 
>> >>>>> >>>>>> >> >> Python 3.5 (at least, early versions of 3.5). Answering 
>> >>>>> >>>>>> >> >> these questions would help set expectations in Beam user 
>> >>>>> >>>>>> >> >> community, Beam dev community, and  may help us establish 
>> >>>>> >>>>>> >> >> resource requirements for test infrastructure and plan 
>> >>>>> >>>>>> >> >> efforts.
>> >>>>> >>>>>> >> >>
>> >>>>> >>>>>> >> >> PEP-0602 [2] establishes a yearly release cycle for 
>> >>>>> >>>>>> >> >> Python versions starting from 3.9. Each release is a 
>> >>>>> >>>>>> >> >> long-term support release and is supported for 5 years: 
>> >>>>> >>>>>> >> >> first 1.5 years allow for general bug fix support, 
>> >>>>> >>>>>> >> >> remaining 3.5 years have security fix support.
>> >>>>> >>>>>> >> >>
>> >>>>> >>>>>> >> >> At every point, there may be up to 5 Python minor 
>> >>>>> >>>>>> >> >> versions that did not yet reach EOL, see "Release overlap 
>> >>>>> >>>>>> >> >> with 12 month diagram" [3]. We can try to support all of 
>> >>>>> >>>>>> >> >> them, but that may come at a cost of velocity: we will 
>> >>>>> >>>>>> >> >> have more tests to maintain, and we will have to develop 
>> >>>>> >>>>>> >> >> Beam against a lower version for a longer period. 
>> >>>>> >>>>>> >> >> Supporting less versions will have implications for user 
>> >>>>> >>>>>> >> >> experience. It also may be difficult to ensure support of 
>> >>>>> >>>>>> >> >> the most recent version early, since our  dependencies 
>> >>>>> >>>>>> >> >> (e.g. picklers) may not be supporting them yet.
>> >>>>> >>>>>> >> >>
>> >>>>> >>>>>> >> >> Currently we support 4 Python versions (2.7, 3.5, 3.6, 
>> >>>>> >>>>>> >> >> 3.7).
>> >>>>> >>>>>> >> >>
>> >>>>> >>>>>> >> >> Is 4 versions a sweet spot? Too much? Too little? What do 
>> >>>>> >>>>>> >> >> you think?
>> >>>>> >>>>>> >> >>
>> >>>>> >>>>>> >> >> [1] 
>> >>>>> >>>>>> >> >> https://github.com/apache/beam/pull/10821#issuecomment-590167711
>> >>>>> >>>>>> >> >> [2] https://www.python.org/dev/peps/pep-0602/
>> >>>>> >>>>>> >> >> [3] https://www.python.org/dev/peps/pep-0602/#id17

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

Reply via email to