potiuk commented on issue #35549: URL: https://github.com/apache/airflow/issues/35549#issuecomment-1805343452
> Mhm, as we have ~90 providers meaning as you need to run a test for each provider (minus a few that are installed per default) you need to install ~80 times a python env with each provider individually. This will take serious long time. I see this can only be done in a nightly, would be hard to squeeze this into the PR tests except if you make it in parallel with pip download caching. About 70%-80% of our PRs concern change in one or few providers. We already have what I named "SelectiveChecks" in place for a few years that produces list of providers affected by the change so most of the PRs of people already run a subset of provider tests (and subset of documentation). It looks into what's coming in the PR (which files changed) and based on that detrmines what set of tests (which provider tests) should be run. It even automatically calculates the set of dependend Providers already (so for example when you modify HTTP provider, OpsGenie, JIRA and few others who use HTTP provider as base will be run). Thanks to that vast majority of those "changes in providers PRs" completes their test in 1-2 minutes rather than 15. It boils down to running tests with `Providers[openai]` for example. Look for example here: https://github.com/apache/airflow/actions/runs/6811281201/job/18521459711?pr=35547 This was a PR that changed only openai provider code. And due to selective checks calculations, the test command that it run was: ``` breeze testing db-tests --parallel-test-types "Always Providers[openai]" ```` the complete test run for this PR was 1m21 seconds Similarly docs build toook 6m instead of complete docs build taking 30 minutes, because the documentation build for that PR was: ``` breeze build-docs apache-airflow openai ```` And this is precisely where we should warn in casee someone changes openai provier where we want to test openai min dependencies. And it is entirely ok to run another 2m test to test it (repeating `Providers[openai]` tests after downgrading dependencies for it. The very same selective checks can be used to run those tests. They could be run automaticaly: whenever `Providers[PROVIDER]` is run - we should run this test in two variants: * current (latest) dependencies * min dependencies The benefit of doing it in the PR is that it's not the magical maintainer who has to go and fix all those, it's the responsibility of contributor who adds new feature to work out what to do - whether we should bump the min requirement, or whether we should add back-compatibility. On the other hand PRs that run all the tests (Airflow Core changes) - do not modify the provider's functionality, so they do not have to run any of those min/max tests. We can run them in their entirety in the `canary` build - this is what the canary build is, one of the functions of it is to generate updates in constratints, but also it runs a complete set of tests for everything to make sure everything is green even if some of the selective checks gave false positive by skipping som of those tests. And there maintainers indeed might (and do) step-in and either fix or revert a change that caused it (like we do currently). I think it is entirely doable to run those tests for vast majority of regular PRs with the selective checks mechanism we have today. > What also jumps into my mind (but might be a different task) is that we also users asking for support because users are not installing with the right constraints. For support-ability, would it also make sense to check installed modules at runtime and generate a warning in UI if requirements are not met (min/max)? We cannot do that I think. Constraints are not really "recommended" set of dependencies. They are the "golden" set which we know we tested, and we know that at the point of time of release they were working. They provide "reproducible installation". But this is where their role ends. Users should be FREE to upgrade/downgrade the dependencies within the min/max requirements. Even more - if the users choose to install newer airflow and older provider, the constraints might not work any more - when for example we bumped Google provider dependencies, if somoene installed an older version of the provider with new Airflow 2.7 (which is entirely possible) - the set of dependencies needed are as far from the current 2.7 constraints as possible - many of the dependencies will have to be downgraded (and this will happen automatically - precisely because older google provider has different set of min/max dependencies and `pip` will follow this and bring those dependencies down. This is also why recommended installation process is: a) Install airflow with extras and constraints -> reproducible installation with "golden set" of depenencies ``` pip install apache-airflow[EXTRAS_HERE]==2.7.3 --constraint http:/// .... 2.7.3/ ... -3.8.txt ``` b) Downgrade/upgrade everything else you need (including downgrading/upgrading providers separately and any other dependencies) WITHOUT CONSTRAINTS. ``` pip install apache-airflow==2.7.3 ADD_YOUR_DEPENDENCIES_INCLUDING_SPECIFIC_PROVIDER_PACKAGE_VERSION ``` Those should be ALWAYS two steps - where step b) is where constraints stop being the thing. Like if they did not exist at all. That's by design. And of course it might mean that some dependencies will break something in this case. But other than paying attention to the min/max expectations in our providers and airflow, we cannot do much abou it. BUT this is precisely WHY we should care about the min requirements - because whatever we put there, will influence `pip` decision on which dependencies get upgraded / downgraded. The b) command is the sole reason why it is worth to do the tests and exercise I mentioned. Because because of the b) case, sometimes people might downgrade some dependencies that should not be downgraded, becasue some of the providers installed expect them to have higher versions implicitlly. This is the whole reason why I want to implement those tests here. This is really why we are using constraints not requirements, because the whole point is that we DO NOT want to prevent users to downgrade/upgrade dependencies. And this is already enfgorced (in some cases just guided but this is another story)by `pip` and min/max requirements of the packages you install in step b) - when you install provider with a set of min/max dependencies, `pip` will try to figure the right set of those that fail into what airflow 2.7.3 expects and what those new dependencies need. That's why we cannot and should not check it at runtime. At the moment the a) `pip install` completes, we drop any notion of constraints and it's like -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
