Re: [I] Enforce upgrades of min-versions for our dependencies [airflow]

via GitHub Fri, 10 Nov 2023 01:03:23 -0800


potiuk commented on issue #35549:
URL: https://github.com/apache/airflow/issues/35549#issuecomment-1805343452


   > Mhm, as we have ~90 providers meaning as you need to run a test for each 
provider (minus a few that are installed per default) you need to install ~80 
times a python env with each provider individually. This will take serious long 
time. I see this can only be done in a nightly, would be hard to squeeze this 
into the PR tests except if you make it in parallel with pip download caching.
   
   About 70%-80% of our PRs concern change in one or few providers. We already 
have  what I named "SelectiveChecks" in place for a few years that produces 
list of providers affected by the change so most of the PRs of people already 
run a subset  of provider tests (and subset of documentation). It looks into 
what's coming in the PR (which files changed) and based on that detrmines what 
set of tests (which provider tests) should be run.  It even automatically 
calculates the set of dependend Providers already (so for example when you 
modify HTTP provider, OpsGenie, JIRA and few others who use HTTP provider as 
base will be run). Thanks to that vast majority of those "changes in providers 
PRs" completes their test in 1-2 minutes rather than 15. 
   
   It boils down to running tests with `Providers[openai]` for example. Look 
for example here:
   
   
https://github.com/apache/airflow/actions/runs/6811281201/job/18521459711?pr=35547
   
   This was a PR that changed only openai provider code. And due to selective 
checks calculations, the test command that it run was:
   
   ```
   breeze testing db-tests --parallel-test-types "Always Providers[openai]"
   ````
   
   the complete test run for this PR was 1m21 seconds
   
   Similarly docs build toook 6m instead of complete docs build taking 30 
minutes, because the documentation build for that PR was:
   
   ```
   breeze build-docs apache-airflow openai
   ````
   
   And this is precisely where we should warn in casee someone changes openai 
provier where we want to test openai min dependencies. And it is entirely ok to 
run another 2m test to test it (repeating `Providers[openai]` tests after 
downgrading dependencies for it.
   
   The very same selective checks can be used to run those tests. They could be 
run automaticaly: whenever `Providers[PROVIDER]` is run - we should run this 
test in two variants: 
   
   * current (latest) dependencies
   * min dependencies
   
   The benefit of doing it in the PR is that it's not the magical maintainer 
who has to go and fix all those, it's the responsibility of contributor who 
adds new feature to work out what to do - whether we should bump the min 
requirement, or whether we should add back-compatibility.
   
   On the other hand PRs that run all the tests (Airflow Core changes) - do not 
modify the provider's functionality, so they do not have to run any of those 
min/max tests.  We can run them in their entirety in the `canary` build - this 
is what the canary build is, one of the functions of it is to generate updates 
in constratints, but also it runs a complete set of tests for everything to 
make sure everything is green even if some of the selective checks gave false 
positive by skipping som of those tests. And there maintainers indeed might 
(and do) step-in and either fix or revert a change that caused it (like we do 
currently).
   
   I think it is entirely doable to run those tests for vast majority of 
regular PRs with the selective checks mechanism we have today.
   
   > What also jumps into my mind (but might be a different task) is that we 
also users asking for support because users are not installing with the right 
constraints. For support-ability, would it also make sense to check installed 
modules at runtime and generate a warning in UI if requirements are not met 
(min/max)?
   
   We cannot do that I think. Constraints are not really "recommended" set of 
dependencies. They are the "golden" set which we know we tested, and we know 
that at the point of time of release they were working. They provide 
"reproducible installation". But this is where their role ends. Users should be 
FREE to upgrade/downgrade the dependencies within the min/max requirements. 
Even more - if the users choose to install newer airflow and older provider, 
the constraints might not work any more - when for example we bumped Google 
provider dependencies, if somoene installed an older version of the provider 
with new Airflow 2.7 (which is entirely possible) - the set of dependencies 
needed are as far from the current 2.7 constraints as possible - many of the 
dependencies will have to be downgraded (and this will happen automatically - 
precisely because older google provider has different set of min/max 
dependencies and `pip` will follow this and bring those dependencies down. This 
is also 
 why recommended installation process is:
   
   a) Install airflow with extras and constraints -> reproducible installation 
with "golden set" of depenencies
   
   ```
   pip install apache-airflow[EXTRAS_HERE]==2.7.3 --constraint http:///  .... 
2.7.3/ ... -3.8.txt
   ```
   
   b)  Downgrade/upgrade everything else you need (including 
downgrading/upgrading providers separately and any other dependencies) WITHOUT 
CONSTRAINTS. 
   
   ```
   pip install apache-airflow==2.7.3  
ADD_YOUR_DEPENDENCIES_INCLUDING_SPECIFIC_PROVIDER_PACKAGE_VERSION
   ```
   
   Those should be ALWAYS two steps - where step b) is where constraints stop 
being the thing. Like if they did not exist at all. That's by design. And of 
course it might mean that some dependencies will break something in this case. 
But other than paying attention to the min/max expectations in our providers 
and airflow, we cannot do much abou it. BUT this is precisely WHY we should 
care about the min requirements - because whatever we put there, will influence 
`pip` decision on which dependencies get upgraded / downgraded. The b) command 
is the sole reason why it is worth to do the tests and exercise I mentioned. 
Because because of the b) case, sometimes people might downgrade some 
dependencies that should not be downgraded, becasue some of the providers 
installed expect them to have higher versions implicitlly. This is the whole 
reason why I want to implement those tests here.
   
   This is really why we are using constraints not requirements, because the 
whole point is that we DO NOT want to prevent users to downgrade/upgrade 
dependencies.  And this is already enfgorced (in some cases just guided but 
this is another story)by `pip` and min/max requirements of the packages you 
install in step b)  - when you install provider with a set of min/max 
dependencies, `pip` will try to figure the right set of those that fail into 
what airflow 2.7.3 expects and what those new dependencies need.
   
   That's why we cannot and should not check it at runtime. At the moment the 
a) `pip install` completes, we drop any notion of constraints and it's like 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Enforce upgrades of min-versions for our dependencies [airflow]

Reply via email to