To not bore you all with a massive wall of text: My tl;dr: sometimes repeating 
a small block of code is the best solution! Lets not to more magic, complexity, 
and auto-code generation than we absolutely have to.

I don’t like the increasing complexity ratchet we seem to be on with pre-commit 
and generated code and more and more rules and “hand-holding”., each change is 
only fractionally more complex, but when you take it in aggregate it leads to 
the entire CI+publishing+docker build+release+test+commit processing being by a 
clear margin the most complex system I have ever seen. My comment was a 
reaction to this. 

The reason I objected so vehemently in this case: it feels to me like a 
solution looking for a problem. The linked issues were about “someone made a 
change to an autogenerated file that wasn’t picked up”, and fixing/detecting 
that is fine.

But somehow it morphed into “lets make a 600 line change, changing imports and 
constant names to prevent against a problem that is very small and easy to 
detect". In many ways this is like the old adage of “sometimes you have a 
problem and reach for regex, and now you have two problems.” I view this as 
needless complexity when we should be simplifying things! Being explicit and 
having `AIRFLOW_VERSION = Version(airflow_version); AIRFLOW_V_3_0_PLUS = 
Version(AIRFLOW_VERSION.base_version) >= Version("3.0.0”)` in a small handful 
of places is a much easier to understand and work on that a field in 
provider.yaml and then run pre-commit to generate some code.

If the goal is to prevent cross-imports that aren’t listed: let’s do that! I 
made a quick POC here 
https://github.com/apache/airflow/pull/44686#issuecomment-2522904089

> On 6 Dec 2024, at 06:29, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> Hello here,
> 
> *TL;DR; Following recent discussions I am seeking an opinion and consensus
> on the way we implement Airflow version checks in providers - ideally in a
> consistent way, because currently we have a very messy approach where 5
> different ways are used in a way that makes it prone to errors.*
> 
> Yesterday we had - I think a bit unnecessarily - heated discussion on a
> change I proposed to apply consistency in the Airflow version compatibility
> check in providers. I proposed the change after some concerns were raised
> about using "common.compat" for this by Daniel (rightfully so IMHO) - and I
> proposed a solution that was supposed to address all problems around the
> version check I was aware about in the PR [1] (Generate constants with
> airflow versions in providers).
> 
> Somewhat surprisingly it raised quite strong objections (borderline
> rejection) and somewhat heated discussions - from both Ash and Daniel -
> while met somewhat enthusiastic approval from Kacper and Jens, and since I
> respect and value everyone's opinion, I decided (following Daniel's
> suggestion) to separate the non-controversial part of my PR [2] (fixing
> __init__.py override problem that already happened for the second time and
> caused problems when Elad attempted to release providers).
> 
> While I hope [2] will be quickly approved as non-controversial, I am also
> seeking community wisdom and decision (consensus hopefully, vote if needed)
> how we approach version checks because we seem to have different opinions
> there.
> 
> Here is a bit more context:
> 
> *Context*:
> 
> In our providers that currently support Airflow 2.8.0+, we often have to
> have conditional code to support all the (4 currently) supported versions
> of Airflow - 2.8, 2.9, 2.10, 3.0 (in the near future also 2.11). This
> happens in main providers but also in the "test" code - because some of our
> tests must also be compatible.
> 
> Airflow version check for the test code is pretty consistent and well
> documented (with examples) [3] and generally well followed "by example" and
> "docs". We use AIRFLOW_V_2_10_PLUS from "tests_common.test_utils.compat"
> module.
> 
> *Problem:*
> 
> However the version ched is currently done pretty inconsistently in various
> providers. Generally version check is done by this snippet:
> 
> from airflow import __version__ as AIRFLOW_VERSION
> AIRFLOW_V_2_10_PLUS = Version(Version(AIRFLOW_VERSION).base_version) >=
> Version("2.10.0")
> 
> But it is used pretty inconsistently and with side-effects
> 
> a) In some cases the code is imported from `common.compat` provider - which
> means that there should be a dependency on common.compat from the provider
> and in the future when we add new versions (2.11.0) we have to bump version
> of the compat.provider as dependency.
> 
> b) in other cases the above snippet is stored in some module (for example
> version_references.py in the providers and used from there in all provider
> modules as
> 
> c) in other cases the snippet is inlined several times in different modules
> of the providers - usually at top level of the module
> 
> d) in other cases the snipped produces differently named constant - for
> example _IS_AIRFLOW_2_10_OR_HIGHER
> 
> e) in other cases the check is directly inlined:  if
> Version(Version(version).base_version) >= Version("3.0.0"):
> 
> f) also in some cases the imports are done from wrong places. Many of those
> top-level constants are named the same AIRFLOW_V_2_10_PLUS. Sometimes we
> have the same constant defined multiple times even in the same provider
> (and same name as the tests_compat one).
> 
> g) in some cases (and that was the cause for release manager issues we had
> and is going to be solved by [1]) some contributors did not realise
> __init__.py is generated (despite comment in the file) - and added the
> constants to the __init__.py file (this is very natural place for those
> constants, so I am not surprised)
> 
> Even currently (but also in the past) it happened that the variables were
> imported from a wrong package - likely - I guess - because when you press
> "Alt-Enter" in IDE, it will import it from *somewhere*  by default - and
> since we have about 12 places where the constant is defined, it might
> accidentally introduce a dependency of providers on test code or another
> providers
> 
> *Proposed solution*
> 
> So - we have quite a bit of a mess here, and my PR [1] aimed to solve it
> and introduce consistency. There is no DRY (but IMHO that's ok - those
> snippets are small-enough to not have to be imported from common place) -
> but the consistency and wrong imports are a problem even now in the main
> branch and it might get worse.
> 
> I believe that we all agree that consistency is generally good. I think
> (but I am interpreting the complaints of Ash and Daniel) that "enforcing"
> the consistency with pre-commit, is wrong and that it is enough to rely on
> reviews (which already failed us as you can see).
> 
> My idea with [2] (following our usual "automate consistency when you see a
> problem) was that we could use what we already - have - generation of the
> __init__.py (which we used during release process so far and [2] adds it
> also to pre-commit) - and generate the constants in the way that will make
> it difficult to import the wrong now. I proposed that for providers that
> need version variants we could generate those constants in provider's
> `__init__.py` :
> 
> PROVIDERS_AMAZON_AIRFLOW_IS_2_10_PLUS = Version("2.10.0") <=
> _AIRFLOW_VERSION
> 
> There are few more things generated - see [4] for example in my PR
> including explanation how it works in the generated code and [5]
> instruction on provider's lifecycle documented why and how we are doing it
> [5] so that the contributors can understand and learn how to do it.
> 
> This way it will be difficult to make the cross-import from different
> providers or test code, we can have a consistent approach, and it is easy
> to review and see mistakes.
> 
> I am totally open to different naming conventions, and  even unconditional
> generation of the version constants - currently it is controlled by a
> "need_version_variants" flag in provider.yaml - but we could also generate
> it for all providers - always.
> 
> I am looking for feedback on it - I hoped it could be done - as usual - in
> the PR, but apparently Ash and Daniel feel very strongly about it and it
> seems that we need to agree in the devlist.
> 
> *What do you think? Few questions:*
> 
> 1) Is this a good idea to not only propose and describe a
> consistent approach but also "enforce" consistency using our existing
> pre-commit framework ?
> 
> 2) Is generation in the __init__.py a good idea - if not what alternative
> people would see?
> 
> 3) Do others share my concern as well that imports of constants named the
> same way (which we already see slipped through reviews a few times) is a
> problem worth addressing by having differently named constants?
> 
> Looking forward to feedback (and sorry for the long mail, but I tried to
> add TL;DR and structure it in a way that is easy to digest and understand
> problems and solutions proposed.
> 
> J.
> 
> [1] Generate constants with airflow versions in providers
> https://github.com/apache/airflow/pull/44686
> [2] Prevent __init__.py in providers from being modified
> https://github.com/apache/airflow/pull/44713
> [3]
> https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#implementing-compatibility-for-provider-tests-for-older-airflow-versions
> 
> [4] Example generated proposal:
> https://github.com/apache/airflow/pull/44686/files#diff-bb74b7a9bbd568ea5d575410753d4e814cbaece5f42d84d96e8f7991a6c4ab95R41
> [5] Documentation in Provider's lifecycle docs
> https://github.com/apache/airflow/pull/44686/files#diff-1ed6f13b432ea574158b9577dfb2a7ad8b88b0a09c45021e94b3c069b36a4408R436


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to