I agree that cases could be different: - someone use _PIP_ADDITIONAL_REQUIREMENTS - someone install packages without pinning Airflow version - some others might use official images without pinning specific versions of python
In all cases this would lead to unintentional/unpredictable results. Changing the policy of default version for official images does not impact anyone who (already) pins the Python version and makes an upgrade when on 99% sure that current changes do not impact anything. And it is valid even with current policy, because sooner or later (once a year) the default version bumped. One significant benefit of changing to the latest supported Python is to get additional beta-testers from the users who just use docker-compose sample in production, in another word I think the main audience of this changes would be: - Someone who never read release notes, or read it after try to instal something new in production - Someone new in Python world or never have an issue that some awesome library which use C-API or/and required build from source distribution accidently stop working/built in the new version of Python. To be honest the same is valid for the old version of python, because some packages stop the build wheel for almost the EOL Python version. - Ctrl+C; Ctrl+V Personally I could not see any benefit of these changes but if someone wants to see this change I would just say 'Why not?". ---- Best Wishes *Andrey Anshin* On Fri, 17 Nov 2023 at 16:52, Jarek Potiuk <[email protected]> wrote: > And yes - agree that the environmental effect is smaller than "bare" Python > benchmark in our case - but I think it is still there. > > There are a number of (valid) cases where people use airflow not only to > purely orchestrate external services, and they are using it run > computationally or logic-intensive tasks and CPU usage is high for airflow > workers, not only scheduler - and in those cases those performance > improvements will be fairly visible. > > On Fri, Nov 17, 2023 at 1:45 PM Jarek Potiuk <[email protected]> wrote: > > > Yeah. I see the point of Andrey - indeed, we had - for quite some time - > > Python 3.11 exclusion for HDFS providers - until it has been fixed. and > we > > already have a built-in mechanism to exclude providers from certain > > versions of Python - it's part of provider.yaml definition and we can > deal > > with it easily. > > > > And this is one of the reasons we do not have YET Python 3.12 support - > > because we need to make sure that at least the vast majority of the > > important providers (and all those that are part of the "regular image") > > work with it . > > So I'd say if we stick to those rules - stability of "latest SUPPORTED" > > version + providers is not impacted. Of course someone might need a > > different provider that has no "latest" support - but that will be > clearly > > documented in Provider documentation (automatically) when it happens and > > the user might easily make a deliberate decision to use a different tag > > (and the decision is very easy to turn into practice). Simply - the > > impacted provider will refuse to install and the user will HAVE TO make > the > > right call at installation time. So I am not concerned at all about > > "provider support" - this is basically solved by dode > > > > The dill + pendulum case is indeed a bit different - it is an edge case - > > but for me that is an indication that yes, we need to document but also > > more importantly - it seems that our test suite is insufficient - this > > error should have been detected in our unit tests (And at the very least > we > > should have PY311 exclusion (and then yes - I concur with the idea of > > having a "known incompatibilities" documentation - possibly even somewhat > > verified automatically with the list of such PY* exclusions we have in > our > > test suite. > > > > Andrey - (maybe we can discuss it in the issue you mentioned - maybe we > > should add such a test case and I am happy to make a PR to propose the > > "Known incompatibilities" page linked to those tests - if that would > remove > > all the obstacles for that move. > > > > J. > > > > > > On Fri, Nov 17, 2023 at 9:48 AM Andrey Anshin <[email protected]> > > wrote: > > > >> Personally for me it is controversial change and tradeoff between > >> Stability > >> vs Performance > >> > >> Since Airflow + Providers have 400+ dependencies, using the lowest > version > >> of python provides better stability and the reason for this is pretty > >> simple - time spent for maintainers of packages to make it more stable > on > >> older Python versions rather than new. That is funny because it is > >> difficult to name 3.11 new one because it was released more than one > year > >> ago. However some of core dependencies of Airflow do not updated for a > >> long > >> time and have "formal" support of Python 3.11 > >> > >> Known Issue in Airflow and Python 3.11 is dill + pendulum, see: > >> https://github.com/apache/airflow/issues/35307 . It not affect all > users, > >> so maybe we could resolve it by a create a "Known Incompatibilities" > page > >> in Airflow documentation > >> > >> However, using tags without an explicit python version is a very nice > way > >> for users to shoot in the foot or cast wormhole (depending on your > >> preference), especially if we talk about apache/airflow:latest. So in > this > >> case I would rather say it doesn't matter which Python version we would > >> use > >> in this case, maybe better even chouse "golden mean" and use Python 3.10 > >> but this also controversial because it required to decide when we need > to > >> shift this selection in this case select between lowest or highest > version > >> it is more straight forward rather than select "golden mean" > >> > >> Performance and environment part is also important however we need to > take > >> in account that Airflow is an application which uses DB backend very > >> intensively and this is a point where total performance advantages of > >> changing the Python version are dramatically reduced. > >> > >> As outcome I do not have any objection to this potential changes because > >> there is no any difference between each of strategy of "default" python > >> selection at all of them have pros and cons due to my personal opinion > >> that > >> use Airflow Image without pin python version make more problem rather > give > >> any advantages and you might have a nice time to debug at the moment > when > >> "default" version changed and it would change in any cases > >> > >> ---- > >> Best Wishes > >> *Andrey Anshin* > >> > >> > >> > >> On Fri, 17 Nov 2023 at 10:10, Wei Lee <[email protected]> wrote: > >> > >> > Agreed, as long as users can still use different versions through > tags, > >> > there are no drawbacks or incompatibilities with this great idea! > >> > > >> > Best, > >> > Wei > >> > > >> > > On Nov 17, 2023, at 1:39 PM, Aritra Basu <[email protected]> > >> > wrote: > >> > > > >> > > Agreed, moving to latest by default sounds like a fine idea. I don't > >> see > >> > > any drawbacks to it and seems like a good enough time as any to make > >> the > >> > > switch with 2.8.0. > >> > > > >> > > -- > >> > > Regards, > >> > > Aritra Basu > >> > > > >> > > On Fri, Nov 17, 2023, 12:33 AM Vincent Beck <[email protected]> > >> wrote: > >> > > > >> > >> I agree, by default we should use the latest python version. Like > any > >> > >> package manager, if the user does not explicitly specify a version, > >> the > >> > >> latest should be used. If the user wants to use a lower version, he > >> can > >> > >> always pin it. > >> > >> > >> > >> On 2023/11/16 12:06:17 Jarek Potiuk wrote: > >> > >>> Hello everyone, > >> > >>> > >> > >>> Since we are close to the Airflow 2.8.0 release, I would like to > >> > propose > >> > >> a > >> > >>> change in the approach for our "default" images. > >> > >>> > >> > >>> Currently there are few images that are considered as "default", > for > >> > >>> example: > >> > >>> > >> > >>> apache/airflow:latest > >> > >>> apache/airflow:2.7.4 > >> > >>> > >> > >>> Currently (according to our process [1] and user documentation > [2]) > >> > those > >> > >>> point to the "oldest" python version we support (currently they > >> point > >> > to > >> > >>> Python 3.8). > >> > >>> > >> > >>> There is no particular reason why it is like that, and with > Airflow > >> > 2.8.0 > >> > >>> we have an opportunity to change it and point the default images > to > >> > >> "latest > >> > >>> supported" (and keep this version as default for the whole MINOR > >> line > >> > of > >> > >>> releases. > >> > >>> > >> > >>> In the case of Airflow 2.8.* - that would be "Python 3.11" being > >> > default > >> > >>> for the whole 2.8.* line unless we manage to get Python 3.12 > >> support in > >> > >> our > >> > >>> CI before we release Airflow 2.8.0, then it would be Python 3.12 > >> > >>> > >> > >>> We do not have any SemVer promises about that. Users can still > >> choose > >> > to > >> > >>> use the "2.8.0-python3.8" tag if they want. > >> > >>> > >> > >>> Generally going to 2.8 should always be a deliberate action, so we > >> have > >> > >>> chance to explain in the release notes that if they want to stick > to > >> > the > >> > >>> 2.8 release. So they are not "losing" anything, they can have 100% > >> > >>> compatibility by just choosing a different image in their > >> deployment. > >> > >> This > >> > >>> **might** cause a little hassle when they migrate if they find > some > >> > >>> incompatibilities, but generally speaking it's a very > >> straightforward > >> > and > >> > >>> simple change - just adding "-python3.8" to your TAG - whatever > >> > >> deployment > >> > >>> option you have. And our users will have to go through it anyway > >> every > >> > >> time > >> > >>> we drop the old Python version (and this change might be even more > >> > costly > >> > >>> as they have no choice then) - so it changes very little, just > >> shifts > >> > the > >> > >>> time where they will have to do it. > >> > >>> > >> > >>> There are benefits of doing it - for both our users and well, > >> > environment > >> > >>> as well (and I really mean a positive impact on the "world > >> environment" > >> > >> to > >> > >>> be honest. Maybe a little impact - but with Airflow's popularity, > it > >> > >> might > >> > >>> make a (small) difference. Python 3.11 is generally 30% faster > than > >> > >>> previous versions and using it by default means that 30% less CPU > is > >> > >> being > >> > >>> wasted. Also it will mean actual money savings for our users. Also > >> > Python > >> > >>> 3.12 comes with even more performance improvements and keeping up > >> with > >> > >>> those being the "default" is a pretty good idea. > >> > >>> > >> > >>> I cannot think of any other drawbacks of this change. > >> > >>> > >> > >>> WDYT? > >> > >>> > >> > >>> [1] Documented versioning approach: > >> > >>> > >> > >> > >> > > >> > https://github.com/apache/airflow#base-os-support-for-reference-airflow-images > >> > >>> [2] User documentation > >> > >>> https://airflow.apache.org/docs/docker-stack/index.html > >> > >>> > >> > >> > >> > >> > --------------------------------------------------------------------- > >> > >> To unsubscribe, e-mail: [email protected] > >> > >> For additional commands, e-mail: [email protected] > >> > >> > >> > >> > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: [email protected] > >> > For additional commands, e-mail: [email protected] > >> > > >> > > >> > > >
