On Mon, Aug 26, 2024 at 11:57 AM Robert Bradshaw <rober...@google.com> wrote:
> On Mon, Aug 26, 2024 at 11:22 AM Valentyn Tymofieiev via dev > <dev@beam.apache.org> wrote: > > > > Interesting findings. When researching Dataflow Python usage with > internal telemetry, I see that Python 3.11 has slightly more usage than > Python 3.8. When I exclude Dev SDKs (this might also exclude some > Google-internal users who use bleeding-edge SDKs), Python 3.8 reaches to > the top. If I exclude Google Dynamic "FLEX" templates, the following become > top 3: > > > > Apache Beam Python 3.9 SDK > > 24.40% > > Apache Beam Python 3.7 SDK > > 23.34% > > Apache Beam Python 3.8 SDK > > 21.63% > > Interesting. I'm assuming this is across all Beam versions, right? > Yes, across all Beam Python versions. > > > > This might be explained by the fact that the default "Python3" flex > template image referenced in the docs (at > https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#python_3) > is Python 3.8. > > We should definitely fix that. > > > > On the other hand, I do like the idea of letting the Python EoL cycle > drive our own supported versions. > > > > +1. As much as I don't like force upgrades, it won't be sustainable long > term to keep versions indefinitely. I don't anticipate any blockers for > switching Python 3.8 to Python 3.9. > > > > > For many workflows like our unit test suites this is not a large > change; the Python version matrix simply omits 3.8 and runs on the > remaining python versions as expected. This is more complicated for a > number of workflows that currently only run on 3.8 or both 3.8 and 3.12, as > GitHub will not run the updated actions in the main repository until the PR > updating them is submitted. > > > > Yes, that's a known inconvenience. I believe this can be worked around > by pushing the changes to a branch on main repo, and then manually > triggering a GHA workflow from that branch, if you want to be really > careful. I think we have this documented somewhere, but I couldn't quickly > find it. @Danny McCormick might have a link. > > > > Merging and iterating sounds good to me if we can quickly roll back/fix > forward changes to not make PRs blocked due to tests not passing. > > This risk is accepting changes that are incompatible with Python 3.8. > Once we drop it (even in the dev repo) we should drop it for good. > > > We also set the default Python version in > https://github.com/apache/beam/blob/9c0a9503ebd59778d488dcfff7fb9417a808152b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L2960 > that might affect some workflows. > > > > > To Robert Bradshaw's point, I wouldn't necessarily be opposed to > pushing out this process to 2.61.0. > > As long as we don't add a new version before remove an existing one, > probably no significant difference for us. > > Sounds like a reasonable plan then. > > > Our dependencies (like numpy, pandas, etc) are definitely dropping > Python 3.8 support, usually ahead of us. Some Google Cloud Python Client > libraries are planning to drop Python 3.8 support after EOL as well. > > > > On Mon, Aug 26, 2024 at 11:17 AM Jack McCluskey via dev < > dev@beam.apache.org> wrote: > >> > >> To Robert Bradshaw's point, I wouldn't necessarily be opposed to > pushing out this process to 2.61.0. That does give more time to validate > some of the actions changes and let us warn users about the drop in 3.8 > support in a release. Admittedly a major motivator for moving off of 3.8 at > EoL is so I can do some overhauling of the type hinting code, as 3.8 is the > last version where PEP-585 type hints are not supported by default (some > context for this is available on my Current State of Beam Python Type > Hinting doc from last November.) But that isn't necessarily urgent work as > far as users are concerned. > >> > >> There's an argument for trying to keep our documentation and tutorials > pointing at relatively recent versions of Beam, but that's probably best > left as a best-effort type thing for now. > >> > >> On Mon, Aug 26, 2024 at 1:41 PM Robert Burke <lostl...@apache.org> > wrote: > >>> > >>> A minor point but often when onboarding, folks will try things > verbatim from the website and documentation: > >>> > >>> > https://github.com/search?q=repo%3Aapache%2Fbeam+python3.7+lang%3AMarkdown+&type=code > >>> > >>> Granted, the most popular combo there was not present in this search, > so it's probably not terribly significant, compared to the reason Robert is > guessing. > >>> > >>> Dunno what we can do about that without going all out in specifying > templated versions to use in our various docs. (That has the different > problem of ensuring everything being described actually works as typed out, > and we are not set up to efficiently validate that for every release.) > >>> > >>> On 2024/08/26 17:30:23 Robert Bradshaw via dev wrote: > >>> > So, 3.8 remains the most popular python version per pypi: > >>> > https://pypistats.org/packages/apache-beam > >>> > > >>> > Breaking down by Beam version over the last 90 days we get > >>> > > >>> > > https://docs.google.com/spreadsheets/d/1-PPxZHs17aXvXgdl439tF7IqIs0XUxtDbDxGYcBg92I > >>> > > >>> > Which shows that this remains true even for the latest Beam releases. > >>> > (Interestingly, one of the most popular combinations is the Python > 3.7 + > >>> > Beam 2.48. I wonder if people are holding off upgrading Beam due to > Python > >>> > 3.7 being dropped...) > >>> > > >>> > Of course, the relationship between pypi downloads and actual > customer > >>> > usage is not 1:1, but is likely directional at least. > >>> > > >>> > On the other hand, I do like the idea of letting the Python EoL > cycle drive > >>> > our own supported versions. Given that 3.8 EoL is in October and our > >>> > release is (hopefully) also in October, what if instead we planned on > >>> > making 2.60 (tentatively) the last officially supported 3.8 release > instead > >>> > of the release in which we drop 3.8 and then see what the stats say > once > >>> > Python is officially EoL. Yes, we could just drop it if that's the > >>> > consensus, but given these usage numbers I don't think the case is > so clear > >>> > cut. > >>> > > >>> > We could also look at what our dependencies are doing. And if > supporting > >>> > 3.8 becomes difficult (e.g. is it being removed from github actions?) > >>> > that's another reason to do so. > >>> > > >>> > > >>> > [image: Skärmavbild 2024-08-26 kl. 10.08.09 fm.png] > >>> > > >>> > > >>> > > >>> > On Mon, Aug 26, 2024 at 9:42 AM Robert Burke <rob...@frantil.com> > wrote: > >>> > > >>> > > I'd take care only relying on the most recent release (as much as > it > >>> > > supports the consensus point). The most recent beam version is > inherently > >>> > > going to have smaller and less consistent numbers, vs N-1 or N-2, > since > >>> > > only the most keen or in need updates immediately. > >>> > > > >>> > > On Mon, Aug 26, 2024, 9:27 AM Danny McCormick via dev < > dev@beam.apache.org> > >>> > > wrote: > >>> > > > >>> > >> Was about to respond, Rebo you beat me to it! I agree DockerHub > is the > >>> > >> right thing to look at since Pypi reporting isn't awesome, I > think we > >>> > >> should only look at the most recent versions though, since 3.8 > will work > >>> > >> for old versions forever. > >>> > >> > >>> > >> For 2.58.0 last month (partial month results), I see: > >>> > >> > >>> > >> "Repo","Unique IPs","Pull by tag","Pull by digest","Version check" > >>> > >> "beam_python312_sdk",151,70,0,410 > >>> > >> "beam_python311_sdk",151,64,0,360 > >>> > >> "beam_python310_sdk",40,97,0,13 > >>> > >> "beam_python3.9_sdk",18,388,0,14 > >>> > >> "beam_python3.8_sdk",36,97,0,2 > >>> > >> > >>> > >> So it was <10% of pulls (including our automation as Rebo pointed > out) > >>> > >> > >>> > >> I'll join Jack, Kenn, and Rebo and agree dropping support is the > right > >>> > >> thing here. The plan SGTM as well. > >>> > >> > >>> > >> Thanks, > >>> > >> Danny > >>> > >> > >>> > >> On Mon, Aug 26, 2024 at 5:21 PM Robert Burke <rob...@frantil.com> > wrote: > >>> > >> > >>> > >>> As an approximation we can use the docker container pulls at > least. > >>> > >>> > >>> > >>> > >>> > >>> Py version : Pulls last week > >>> > >>> > >>> > >>> 3.8: 7476 > >>> > >>> 3.9: 1,259 > >>> > >>> 3.10: 6169 > >>> > >>> 3.11: 2999 > >>> > >>> 3.12: 241 > >>> > >>> > >>> > >>> 3.7: 395 > >>> > >>> 3.6: 241 > >>> > >>> 3.4: 156 > >>> > >>> 2.7: 188 > >>> > >>> > >>> > >>> But note that any of our automation for 3.8 that pulls > containers would > >>> > >>> impact these result too. > >>> > >>> > >>> > >>> I will note that Beam dropping 3.8 support shouldn't be a > problem given > >>> > >>> the general end of support of 3.8. > >>> > >>> > >>> > >>> Users can always upgrade their python version separately from > the Beam > >>> > >>> version, and then update the Beam version. Ultimately, the cost > of the > >>> > >>> latest and greatest version, is staying up to date. > >>> > >>> > >>> > >>> > >>> > >>> On Mon, Aug 26, 2024, 8:24 AM Kenneth Knowles <k...@apache.org> > wrote: > >>> > >>> > >>> > >>>> SGTM > >>> > >>>> > >>> > >>>> Incidentally I poked around on pypi for a minute but didn't > find even > >>> > >>>> basic download analytics. Do we have data about usage of Python > versions? > >>> > >>>> (this is not pushback - I'm all for turning things down on a > natural pace > >>> > >>>> (or faster!); I'm just even *more* for having data around it) > >>> > >>>> > >>> > >>>> Kenn > >>> > >>>> > >>> > >>>> On Mon, Aug 26, 2024 at 10:59 AM Jack McCluskey via dev < > >>> > >>>> dev@beam.apache.org> wrote: > >>> > >>>> > >>> > >>>>> Hey everyone, > >>> > >>>>> > >>> > >>>>> With Python 3.8 reaching end-of-life in October, I've started > the work > >>> > >>>>> of removing support in the Beam repository. The aim is to > target Beam > >>> > >>>>> release 2.60.0 for this, since the expected release cut date > is on > >>> > >>>>> October 2nd, 2024. The start of this effort is at > >>> > >>>>> https://github.com/apache/beam/pull/32283/, updating our > GitHub > >>> > >>>>> Actions workflows. For many workflows like our unit test > suites this is not > >>> > >>>>> a large change; the Python version matrix simply omits 3.8 and > runs on the > >>> > >>>>> remaining python versions as expected. This is more > complicated for a > >>> > >>>>> number of workflows that currently only run on 3.8 or both 3.8 > and 3.12, as > >>> > >>>>> GitHub will not run the updated actions in the main repository > until the PR > >>> > >>>>> updating them is submitted. This can already be seen in some > workflow runs > >>> > >>>>> on the PR where Python 3.8 is no longer being installed in the > runner > >>> > >>>>> environment, leading to failures. > >>> > >>>>> > >>> > >>>>> The current plan is to do as much validation of the new > workflow files > >>> > >>>>> as I can before the above PR is submitted (hopefully the week > after Beam > >>> > >>>>> Summit,) then focus on getting any potential workflow > breakages resolved > >>> > >>>>> before removing the core Python 3.8 support from the package. > There may be > >>> > >>>>> some instability with our workflows, and I will try my best to > resolve > >>> > >>>>> things as they pop up. This is the first Python version to > have support > >>> > >>>>> dropped since we migrated to GitHub Actions, so there's going > to be a > >>> > >>>>> decent amount of trial and error as we navigate this. That > said, if you > >>> > >>>>> notice problems please let me know! Either file a standalone > issue and tag > >>> > >>>>> me on it (@jrmccluskey) or leave a comment on > >>> > >>>>> https://github.com/apache/beam/issues/31192 so I can take a > look. > >>> > >>>>> > >>> > >>>>> Thanks, > >>> > >>>>> > >>> > >>>>> Jack McCluskey > >>> > >>>>> > >>> > >>>>> -- > >>> > >>>>> > >>> > >>>>> > >>> > >>>>> Jack McCluskey > >>> > >>>>> SWE - DataPLS PLAT/ Dataflow ML > >>> > >>>>> RDU > >>> > >>>>> jrmcclus...@google.com > >>> > >>>>> > >>> > >>>>> > >>> > >>>>> > >>> > >