Hi all,

I discovered this issue when publishing docker images for Airflow 2.0.1 and
brought this up for discussion with Jarek mainly because I think this
problem will occur in future too. While this is not fatal, it would be good
to solve it and truly separate Airflow Core and providers.

The current approach of still having extras within the core that affect
constraints is far from ideal in my opinion.

*For Problem (1)* there is an easy workaround to use pip with legacy
resolver or pip <= 20.3.3 that allows installing new dependencies even with
conflicts for the provider. We also have notes everywhere on our
installation guide (
https://airflow.apache.org/docs/apache-airflow/stable/installation.html#installation-tools)
that says:

There was a recent (November 2020) change in resolver, so currently only
> 20.2.4 version is officially supported, although you might have a success
> with 20.3.3+ version (to be confirmed if all initial issues from pip 20.3.0
> release have been fixed in 20.3.3). In order to install Airflow you need to
> either downgrade pip to version 20.2.4 pip install --upgrade pip==20.2.4 or,
> in case you use Pip 20.3, you need to add option --use-deprecated
> legacy-resolver to your pip install command.


But we should fix it in master with *solution (1)* so when we release
Airflow 2.0.2 (a month or so from now) we don't have such an issue and
don't need a workaround.

*For Problem (2) *We should include the providers in constraints file
(example add *apache-airflow-provider-google==1.0.0*) and once a
constraints file is published we should not change it. Otherwise, it
defeats the purpose of "repeatable installation". I would be against that
change. We should not change the constraints tag (e.g constraints-2.0.0)
once 2.0.0 is released with it and treat it same as 2.0.0 tag.

Regards,
Kaxil

On Wed, Feb 10, 2021 at 11:35 PM Jarek Potiuk <[email protected]> wrote:

> Hello everyone,
>
> TL;DR; The recent release of Airflow 2.0.1 + providers have shown that
> there is a small problem with the current setup of extras and providers.
> This email outlines two problems with it together with proposals
> of solutions and I seek your opinion/comments.
>
> This is a cas I have not foreseen before, but one that can be rather
> easily fixed and I wanted to reach out with a proposal. I spend some time
> today (right after finishing my 3rd on-boarding day @Snowflake :) ) and I
> think I have a sound proposal:
>
> Problem statement:
>
> *Problem 1) Duplicated/conflicting dependencies in providers and extras.*
>
> Currently when you install airflow with provider-extra (for example `pip
> install apache-airflow[google]==2.0.1", you might get conflicts of
> dependencies when new providers are released with conflicting dependencies.
> This happens with the 'google' provider in the last release (
> https://pypi.org/project/apache-airflow-providers-google/2.0.0/). We've
> migrated a lot of google provider's dependencies to backwards-incompatible
> google 2.0 APIs in the last release so the version of a number of
> dependencies changed in the new providers.
>
> As an example - google-cloud-automl was previously within `>=0.4.0,<2.0.0`
> and in google provider 2.0.0 it is `>=2.1.0,<3.0.0`.
>
> Unfortunately we still store the 'old' requirements in [google] extra, but
> the new provider has 'new requirements'. If we want to make a one-command
> installation with latest providers this creates conflict. It is not a
> problem to upgrade the provider separately though (because this only
> applies if you use the [google] extra at installation time). Currently the
> extra in apache airflow hs both - 'apache-airflow-provider-google' and all
> the (old) 'apache-airlfow-provider-google' direct dependencies.
>
> *Solution proposal:*
>
> The solution is easy - we should only leave the un-constraint
> "apache-airflow-provider-google" as the requirement in [google] extra. This
> will add the dependencies from the provider transitively but when we
> release a new provider version, it will work just fine (new transitive deps
> will be used). We could (rather quickly) implement it and release 2.0.2
> version with only those changes - removal of provider-specific dependencies
> from extras, leaving only the "apache-airlfow-provider-xxxxx" as dependency
> for "xxxxx" provider.
>
>
> *Problem 2) constraints for past airflow versions might conflict with the
> new providers.*
>
> We currently have constraint files (example
> https://github.com/apache/airflow/blob/constraints-2.0.1/constraints-3.6.txt)
> to allow for repeatable installation of airflow with any combination of
> providers. But when we release new providers, those constraints might
> conflict (the very same example with google 2.0.0 provider as above). This
> means that new providers might not cleanly install.
>
> The good thing is, those constraints can be easily updated. We can easily
> regenerate the constraints, taking into account the new providers released
> and simply branch them off moving the tags for all 2.0 (and 2.1 and so on)
> releases. This should be rather easy to automate.
>
> *Solution proposal:*
>
> Every time when we release a new wave of providers, we regenerate the
> constraints for all past released 2.* versions of airflow, so that the new
> providers are taken into account and they can install cleanly with `pip
> install apache-airflow[provider]==2.0.N --constraint == .... 2.0.N/python
> ...
>
> Both problems can be solved rather easily. 1) requires 2.0.2 release of
> Airflow, 2) can be implemented any time (happy to do it).
>
> Let me know what you think.
>
> J,
>
>
>
>
>
>
>
>
> --
> +48 660 796 129
>

Reply via email to