Hello here,

*TL;DR; Following recent discussions I am seeking an opinion and consensus
on the way we implement Airflow version checks in providers - ideally in a
consistent way, because currently we have a very messy approach where 5
different ways are used in a way that makes it prone to errors.*

Yesterday we had - I think a bit unnecessarily - heated discussion on a
change I proposed to apply consistency in the Airflow version compatibility
check in providers. I proposed the change after some concerns were raised
about using "common.compat" for this by Daniel (rightfully so IMHO) - and I
proposed a solution that was supposed to address all problems around the
version check I was aware about in the PR [1] (Generate constants with
airflow versions in providers).

Somewhat surprisingly it raised quite strong objections (borderline
rejection) and somewhat heated discussions - from both Ash and Daniel -
while met somewhat enthusiastic approval from Kacper and Jens, and since I
respect and value everyone's opinion, I decided (following Daniel's
suggestion) to separate the non-controversial part of my PR [2] (fixing
__init__.py override problem that already happened for the second time and
caused problems when Elad attempted to release providers).

While I hope [2] will be quickly approved as non-controversial, I am also
seeking community wisdom and decision (consensus hopefully, vote if needed)
how we approach version checks because we seem to have different opinions
there.

Here is a bit more context:

*Context*:

In our providers that currently support Airflow 2.8.0+, we often have to
have conditional code to support all the (4 currently) supported versions
of Airflow - 2.8, 2.9, 2.10, 3.0 (in the near future also 2.11). This
happens in main providers but also in the "test" code - because some of our
tests must also be compatible.

Airflow version check for the test code is pretty consistent and well
documented (with examples) [3] and generally well followed "by example" and
"docs". We use AIRFLOW_V_2_10_PLUS from "tests_common.test_utils.compat"
module.

*Problem:*

However the version ched is currently done pretty inconsistently in various
providers. Generally version check is done by this snippet:

from airflow import __version__ as AIRFLOW_VERSION
AIRFLOW_V_2_10_PLUS = Version(Version(AIRFLOW_VERSION).base_version) >=
Version("2.10.0")

But it is used pretty inconsistently and with side-effects

a) In some cases the code is imported from `common.compat` provider - which
means that there should be a dependency on common.compat from the provider
and in the future when we add new versions (2.11.0) we have to bump version
of the compat.provider as dependency.

b) in other cases the above snippet is stored in some module (for example
version_references.py in the providers and used from there in all provider
modules as

c) in other cases the snippet is inlined several times in different modules
of the providers - usually at top level of the module

d) in other cases the snipped produces differently named constant - for
example _IS_AIRFLOW_2_10_OR_HIGHER

e) in other cases the check is directly inlined:  if
Version(Version(version).base_version) >= Version("3.0.0"):

f) also in some cases the imports are done from wrong places. Many of those
top-level constants are named the same AIRFLOW_V_2_10_PLUS. Sometimes we
have the same constant defined multiple times even in the same provider
(and same name as the tests_compat one).

g) in some cases (and that was the cause for release manager issues we had
and is going to be solved by [1]) some contributors did not realise
__init__.py is generated (despite comment in the file) - and added the
constants to the __init__.py file (this is very natural place for those
constants, so I am not surprised)

Even currently (but also in the past) it happened that the variables were
imported from a wrong package - likely - I guess - because when you press
"Alt-Enter" in IDE, it will import it from *somewhere*  by default - and
since we have about 12 places where the constant is defined, it might
accidentally introduce a dependency of providers on test code or another
providers

*Proposed solution*

So - we have quite a bit of a mess here, and my PR [1] aimed to solve it
and introduce consistency. There is no DRY (but IMHO that's ok - those
snippets are small-enough to not have to be imported from common place) -
but the consistency and wrong imports are a problem even now in the main
branch and it might get worse.

I believe that we all agree that consistency is generally good. I think
(but I am interpreting the complaints of Ash and Daniel) that "enforcing"
the consistency with pre-commit, is wrong and that it is enough to rely on
reviews (which already failed us as you can see).

My idea with [2] (following our usual "automate consistency when you see a
problem) was that we could use what we already - have - generation of the
__init__.py (which we used during release process so far and [2] adds it
also to pre-commit) - and generate the constants in the way that will make
it difficult to import the wrong now. I proposed that for providers that
need version variants we could generate those constants in provider's
`__init__.py` :

PROVIDERS_AMAZON_AIRFLOW_IS_2_10_PLUS = Version("2.10.0") <=
_AIRFLOW_VERSION

There are few more things generated - see [4] for example in my PR
including explanation how it works in the generated code and [5]
instruction on provider's lifecycle documented why and how we are doing it
[5] so that the contributors can understand and learn how to do it.

This way it will be difficult to make the cross-import from different
providers or test code, we can have a consistent approach, and it is easy
to review and see mistakes.

I am totally open to different naming conventions, and  even unconditional
generation of the version constants - currently it is controlled by a
"need_version_variants" flag in provider.yaml - but we could also generate
it for all providers - always.

I am looking for feedback on it - I hoped it could be done - as usual - in
the PR, but apparently Ash and Daniel feel very strongly about it and it
seems that we need to agree in the devlist.

*What do you think? Few questions:*

1) Is this a good idea to not only propose and describe a
consistent approach but also "enforce" consistency using our existing
pre-commit framework ?

2) Is generation in the __init__.py a good idea - if not what alternative
people would see?

3) Do others share my concern as well that imports of constants named the
same way (which we already see slipped through reviews a few times) is a
problem worth addressing by having differently named constants?

Looking forward to feedback (and sorry for the long mail, but I tried to
add TL;DR and structure it in a way that is easy to digest and understand
problems and solutions proposed.

J.

[1] Generate constants with airflow versions in providers
https://github.com/apache/airflow/pull/44686
[2] Prevent __init__.py in providers from being modified
https://github.com/apache/airflow/pull/44713
[3]
https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#implementing-compatibility-for-provider-tests-for-older-airflow-versions

[4] Example generated proposal:
https://github.com/apache/airflow/pull/44686/files#diff-bb74b7a9bbd568ea5d575410753d4e814cbaece5f42d84d96e8f7991a6c4ab95R41
[5] Documentation in Provider's lifecycle docs
https://github.com/apache/airflow/pull/44686/files#diff-1ed6f13b432ea574158b9577dfb2a7ad8b88b0a09c45021e94b3c069b36a4408R436

Reply via email to