Hello everyone, TL;DR; The recent release of Airflow 2.0.1 + providers have shown that there is a small problem with the current setup of extras and providers. This email outlines two problems with it together with proposals of solutions and I seek your opinion/comments.
This is a cas I have not foreseen before, but one that can be rather easily fixed and I wanted to reach out with a proposal. I spend some time today (right after finishing my 3rd on-boarding day @Snowflake :) ) and I think I have a sound proposal: Problem statement: *Problem 1) Duplicated/conflicting dependencies in providers and extras.* Currently when you install airflow with provider-extra (for example `pip install apache-airflow[google]==2.0.1", you might get conflicts of dependencies when new providers are released with conflicting dependencies. This happens with the 'google' provider in the last release ( https://pypi.org/project/apache-airflow-providers-google/2.0.0/). We've migrated a lot of google provider's dependencies to backwards-incompatible google 2.0 APIs in the last release so the version of a number of dependencies changed in the new providers. As an example - google-cloud-automl was previously within `>=0.4.0,<2.0.0` and in google provider 2.0.0 it is `>=2.1.0,<3.0.0`. Unfortunately we still store the 'old' requirements in [google] extra, but the new provider has 'new requirements'. If we want to make a one-command installation with latest providers this creates conflict. It is not a problem to upgrade the provider separately though (because this only applies if you use the [google] extra at installation time). Currently the extra in apache airflow hs both - 'apache-airflow-provider-google' and all the (old) 'apache-airlfow-provider-google' direct dependencies. *Solution proposal:* The solution is easy - we should only leave the un-constraint "apache-airflow-provider-google" as the requirement in [google] extra. This will add the dependencies from the provider transitively but when we release a new provider version, it will work just fine (new transitive deps will be used). We could (rather quickly) implement it and release 2.0.2 version with only those changes - removal of provider-specific dependencies from extras, leaving only the "apache-airlfow-provider-xxxxx" as dependency for "xxxxx" provider. *Problem 2) constraints for past airflow versions might conflict with the new providers.* We currently have constraint files (example https://github.com/apache/airflow/blob/constraints-2.0.1/constraints-3.6.txt) to allow for repeatable installation of airflow with any combination of providers. But when we release new providers, those constraints might conflict (the very same example with google 2.0.0 provider as above). This means that new providers might not cleanly install. The good thing is, those constraints can be easily updated. We can easily regenerate the constraints, taking into account the new providers released and simply branch them off moving the tags for all 2.0 (and 2.1 and so on) releases. This should be rather easy to automate. *Solution proposal:* Every time when we release a new wave of providers, we regenerate the constraints for all past released 2.* versions of airflow, so that the new providers are taken into account and they can install cleanly with `pip install apache-airflow[provider]==2.0.N --constraint == .... 2.0.N/python ... Both problems can be solved rather easily. 1) requires 2.0.2 release of Airflow, 2) can be implemented any time (happy to do it). Let me know what you think. J, -- +48 660 796 129
