Merry Xmas :)

I have a warm feeling about keeping the providers as extras (but also
I'd love to hear what others think).

I only think we should change the extras to be "[providers.google],
[providers.amazon], [providers.cncf.kubernetes] ... " to keep them
easily separated from non-provider extras and get more "explicit"
information which extra brings with it a provider.

Extras  - in general - are pretty evil (especially if they are used in
transitive dependencies). I am sure there are few people around that
will agree with me.

However, they are really, really convenient to install optional stuff.
And this is the main reason why I think we should keep them.

I think the example you gave Kaxil : pip install -U
"apache-airflow[google]==2.2.3" -c $CONSTRAINTS_URL - is a very good
one and I think it is expected behavior (and one that our users should
learn to expect). This is the exact same behavior if for example you
run "pip install -U apache-airflow[virtualenv]" for example. In case
we used different "golden" virtualenv versions, it will also upgrade
virtualenv (and all its deps). And we also have this very clearly
stated in the docs where we explain what are the different upgrade
scenarios: 
https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#upgrading-airflow-with-providers

BTW. The `-U/--upgrade` flag in this case is not effective. The
`--upgrade` flag only affects the "direct" installation target and not
dependencies 
(https://pip.pypa.io/en/stable/user_guide/#only-if-needed-recursive-upgrade)
unless --upgrade-strategy is specified. In this case the constraints
affect upgrade of dependencies, not the --upgrade flag.
so: pip install "apache-airflow[google]==2.2.3" -c $CONSTRAINTS_URL  -
will also upgrade all the deps of core airflow and google provider to
the versions specified in the constraints.

I just learned about that recently too (from TP) and just opened a PR
to correct it in our examples:
https://github.com/apache/airflow/pull/20537/files

Why do I think this is good behavior ?

1) Because this is the strategy we took for providers. We only
introduce breaking changes when we must, but we always give the users
an opportunity to downgrade selected providers if they see a problem.
It's an "optimistic" strategy, sure. But one that is very close to
reality. Even when we had breaking changes in Google or some other
bigger providers, it was very likely things will continue working for
most users (breaking changes were usually very localized). And we
managed to keep our providers mostly backwards compatible during last
year, without huge maintenance burden (heavily increased maintenance
burden is the only reason why backwards compatibility should be broken
IMHO). And you get - most of the time with the benefit of using the
latest and greatest dependencies most of the time (which is great for
security and should be even more important after the recent log4j
drama). And the whole point of providers is that you still can
selectively downgrade. This is really a huge thing praised in a number
of conversations I had with our users.

2) Because providers are "the same" kind of dependencies as 3rd-party
dependencies. If we upgrade virtualenv to the latest version (no
matter if it's breaking or not - as long as it passes the tests), why
should we not upgrade providers in the same way?

3) Most important - because this is what we anyway do in our reference
image and I cannot image we change it. Users of our image will anyhow
get precisely this default behavior. If they use 2.1 and upgrade to
2.3. all the providers embedded (and even all those they install using
constraints) will be upgraded by default. And there is not much we can
do (unless we completely strip-off the image from providers - which is
not a good idea I think).

What would change if we remove extras ?

Not much. I think we should give the advice to our users what they
should do if they want to follow the same strategy as our "images".
The advice would change to: `pip install apache-airflow==2.2.3
apache-airflow-providers-google -c $CONSTRAINTS_URL`  - which would
have the same effect. Only much longer to write. And they can do it
today.

If they want to stay with the exact version of the provider they have.
They just have to run `pip install -U apache-airflow==2.2.3 -c
$CONSTRAINTS_URL` - that would not change.

This is even one of the options we list as "installation and upgrade
scenarios" 
https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#installation-and-upgrade-of-airflow-core.

But I think if we change "[google]" to "[providers.google]" as extra,
it would be a much more obvious way of making the users aware that
it's about the provider's upgrade as well.

J.



J.

On Fri, Dec 24, 2021 at 5:18 PM Kaxil Naik <[email protected]> wrote:
>
> Hi folks,
>
> Merry Christmas 🎄🎅.
>
> Currently, Airflow allows installing providers via "extras" - 
> https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html#providers-extras
>  for convenience.
>
> We initially did this for easing Migrations for users from 1.10.x to 2.x. But 
> I want to discuss this again as I feel users can unintentionally upgrade the 
> providers versions.
>
> Example:
>
> pip install -U "apache-airflow[google]==2.2.3" -c $CONSTRAINTS_URL
>
>
> Now, if a new major version of the provider is released between Airflow 2.2.2 
> and 2.2.3 and the user uses that command, it will bump the 
> "apache-airflow-providers-google" providers from as en example 4.0.0 to 5.0.0 
> and all of its dependencies.
>
> This might have unintended consequences. Now an easy solution is to downgrade 
> the provider version to the previously installed version by running if the 
> user notices this:
>
> pip install -U "apache-airflow-providers-google==4.0.0"
>
>
> However, I feel we can stop this unintended upgrade by not allowing the 
> installation of providers via extras. This would also clear out any confusion 
> users might have on installing providers as we will only have a single way to 
> install them and truly separate providers from the core. And users can 
> upgrade each provider only when they need to and asses when upgrading to 
> major versions of the provider.
>
> On the flip side, installing providers via extras is actually really 
> convenient 😁 and I use them all the time for testing.
>
> Thoughts?
>
> Regards,
> Kaxil

Reply via email to