Hello everyone,

TL;DR; The recent release of Airflow 2.0.1 + providers have shown that
there is a small problem with the current setup of extras and providers.
This email outlines two problems with it together with proposals
of solutions and I seek your opinion/comments.

This is a cas I have not foreseen before, but one that can be rather easily
fixed and I wanted to reach out with a proposal. I spend some time today
(right after finishing my 3rd on-boarding day @Snowflake :) ) and I think I
have a sound proposal:

Problem statement:

*Problem 1) Duplicated/conflicting dependencies in providers and extras.*

Currently when you install airflow with provider-extra (for example `pip
install apache-airflow[google]==2.0.1", you might get conflicts of
dependencies when new providers are released with conflicting dependencies.
This happens with the 'google' provider in the last release (
https://pypi.org/project/apache-airflow-providers-google/2.0.0/). We've
migrated a lot of google provider's dependencies to backwards-incompatible
google 2.0 APIs in the last release so the version of a number of
dependencies changed in the new providers.

As an example - google-cloud-automl was previously within `>=0.4.0,<2.0.0`
and in google provider 2.0.0 it is `>=2.1.0,<3.0.0`.

Unfortunately we still store the 'old' requirements in [google] extra, but
the new provider has 'new requirements'. If we want to make a one-command
installation with latest providers this creates conflict. It is not a
problem to upgrade the provider separately though (because this only
applies if you use the [google] extra at installation time). Currently the
extra in apache airflow hs both - 'apache-airflow-provider-google' and all
the (old) 'apache-airlfow-provider-google' direct dependencies.

*Solution proposal:*

The solution is easy - we should only leave the un-constraint
"apache-airflow-provider-google" as the requirement in [google] extra. This
will add the dependencies from the provider transitively but when we
release a new provider version, it will work just fine (new transitive deps
will be used). We could (rather quickly) implement it and release 2.0.2
version with only those changes - removal of provider-specific dependencies
from extras, leaving only the "apache-airlfow-provider-xxxxx" as dependency
for "xxxxx" provider.


*Problem 2) constraints for past airflow versions might conflict with the
new providers.*

We currently have constraint files (example
https://github.com/apache/airflow/blob/constraints-2.0.1/constraints-3.6.txt)
to allow for repeatable installation of airflow with any combination of
providers. But when we release new providers, those constraints might
conflict (the very same example with google 2.0.0 provider as above). This
means that new providers might not cleanly install.

The good thing is, those constraints can be easily updated. We can easily
regenerate the constraints, taking into account the new providers released
and simply branch them off moving the tags for all 2.0 (and 2.1 and so on)
releases. This should be rather easy to automate.

*Solution proposal:*

Every time when we release a new wave of providers, we regenerate the
constraints for all past released 2.* versions of airflow, so that the new
providers are taken into account and they can install cleanly with `pip
install apache-airflow[provider]==2.0.N --constraint == .... 2.0.N/python
...

Both problems can be solved rather easily. 1) requires 2.0.2 release of
Airflow, 2) can be implemented any time (happy to do it).

Let me know what you think.

J,








-- 
+48 660 796 129

Reply via email to