To not bore you all with a massive wall of text: My tl;dr: sometimes repeating a small block of code is the best solution! Lets not to more magic, complexity, and auto-code generation than we absolutely have to.
I don’t like the increasing complexity ratchet we seem to be on with pre-commit and generated code and more and more rules and “hand-holding”., each change is only fractionally more complex, but when you take it in aggregate it leads to the entire CI+publishing+docker build+release+test+commit processing being by a clear margin the most complex system I have ever seen. My comment was a reaction to this. The reason I objected so vehemently in this case: it feels to me like a solution looking for a problem. The linked issues were about “someone made a change to an autogenerated file that wasn’t picked up”, and fixing/detecting that is fine. But somehow it morphed into “lets make a 600 line change, changing imports and constant names to prevent against a problem that is very small and easy to detect". In many ways this is like the old adage of “sometimes you have a problem and reach for regex, and now you have two problems.” I view this as needless complexity when we should be simplifying things! Being explicit and having `AIRFLOW_VERSION = Version(airflow_version); AIRFLOW_V_3_0_PLUS = Version(AIRFLOW_VERSION.base_version) >= Version("3.0.0”)` in a small handful of places is a much easier to understand and work on that a field in provider.yaml and then run pre-commit to generate some code. If the goal is to prevent cross-imports that aren’t listed: let’s do that! I made a quick POC here https://github.com/apache/airflow/pull/44686#issuecomment-2522904089 > On 6 Dec 2024, at 06:29, Jarek Potiuk <ja...@potiuk.com> wrote: > > Hello here, > > *TL;DR; Following recent discussions I am seeking an opinion and consensus > on the way we implement Airflow version checks in providers - ideally in a > consistent way, because currently we have a very messy approach where 5 > different ways are used in a way that makes it prone to errors.* > > Yesterday we had - I think a bit unnecessarily - heated discussion on a > change I proposed to apply consistency in the Airflow version compatibility > check in providers. I proposed the change after some concerns were raised > about using "common.compat" for this by Daniel (rightfully so IMHO) - and I > proposed a solution that was supposed to address all problems around the > version check I was aware about in the PR [1] (Generate constants with > airflow versions in providers). > > Somewhat surprisingly it raised quite strong objections (borderline > rejection) and somewhat heated discussions - from both Ash and Daniel - > while met somewhat enthusiastic approval from Kacper and Jens, and since I > respect and value everyone's opinion, I decided (following Daniel's > suggestion) to separate the non-controversial part of my PR [2] (fixing > __init__.py override problem that already happened for the second time and > caused problems when Elad attempted to release providers). > > While I hope [2] will be quickly approved as non-controversial, I am also > seeking community wisdom and decision (consensus hopefully, vote if needed) > how we approach version checks because we seem to have different opinions > there. > > Here is a bit more context: > > *Context*: > > In our providers that currently support Airflow 2.8.0+, we often have to > have conditional code to support all the (4 currently) supported versions > of Airflow - 2.8, 2.9, 2.10, 3.0 (in the near future also 2.11). This > happens in main providers but also in the "test" code - because some of our > tests must also be compatible. > > Airflow version check for the test code is pretty consistent and well > documented (with examples) [3] and generally well followed "by example" and > "docs". We use AIRFLOW_V_2_10_PLUS from "tests_common.test_utils.compat" > module. > > *Problem:* > > However the version ched is currently done pretty inconsistently in various > providers. Generally version check is done by this snippet: > > from airflow import __version__ as AIRFLOW_VERSION > AIRFLOW_V_2_10_PLUS = Version(Version(AIRFLOW_VERSION).base_version) >= > Version("2.10.0") > > But it is used pretty inconsistently and with side-effects > > a) In some cases the code is imported from `common.compat` provider - which > means that there should be a dependency on common.compat from the provider > and in the future when we add new versions (2.11.0) we have to bump version > of the compat.provider as dependency. > > b) in other cases the above snippet is stored in some module (for example > version_references.py in the providers and used from there in all provider > modules as > > c) in other cases the snippet is inlined several times in different modules > of the providers - usually at top level of the module > > d) in other cases the snipped produces differently named constant - for > example _IS_AIRFLOW_2_10_OR_HIGHER > > e) in other cases the check is directly inlined: if > Version(Version(version).base_version) >= Version("3.0.0"): > > f) also in some cases the imports are done from wrong places. Many of those > top-level constants are named the same AIRFLOW_V_2_10_PLUS. Sometimes we > have the same constant defined multiple times even in the same provider > (and same name as the tests_compat one). > > g) in some cases (and that was the cause for release manager issues we had > and is going to be solved by [1]) some contributors did not realise > __init__.py is generated (despite comment in the file) - and added the > constants to the __init__.py file (this is very natural place for those > constants, so I am not surprised) > > Even currently (but also in the past) it happened that the variables were > imported from a wrong package - likely - I guess - because when you press > "Alt-Enter" in IDE, it will import it from *somewhere* by default - and > since we have about 12 places where the constant is defined, it might > accidentally introduce a dependency of providers on test code or another > providers > > *Proposed solution* > > So - we have quite a bit of a mess here, and my PR [1] aimed to solve it > and introduce consistency. There is no DRY (but IMHO that's ok - those > snippets are small-enough to not have to be imported from common place) - > but the consistency and wrong imports are a problem even now in the main > branch and it might get worse. > > I believe that we all agree that consistency is generally good. I think > (but I am interpreting the complaints of Ash and Daniel) that "enforcing" > the consistency with pre-commit, is wrong and that it is enough to rely on > reviews (which already failed us as you can see). > > My idea with [2] (following our usual "automate consistency when you see a > problem) was that we could use what we already - have - generation of the > __init__.py (which we used during release process so far and [2] adds it > also to pre-commit) - and generate the constants in the way that will make > it difficult to import the wrong now. I proposed that for providers that > need version variants we could generate those constants in provider's > `__init__.py` : > > PROVIDERS_AMAZON_AIRFLOW_IS_2_10_PLUS = Version("2.10.0") <= > _AIRFLOW_VERSION > > There are few more things generated - see [4] for example in my PR > including explanation how it works in the generated code and [5] > instruction on provider's lifecycle documented why and how we are doing it > [5] so that the contributors can understand and learn how to do it. > > This way it will be difficult to make the cross-import from different > providers or test code, we can have a consistent approach, and it is easy > to review and see mistakes. > > I am totally open to different naming conventions, and even unconditional > generation of the version constants - currently it is controlled by a > "need_version_variants" flag in provider.yaml - but we could also generate > it for all providers - always. > > I am looking for feedback on it - I hoped it could be done - as usual - in > the PR, but apparently Ash and Daniel feel very strongly about it and it > seems that we need to agree in the devlist. > > *What do you think? Few questions:* > > 1) Is this a good idea to not only propose and describe a > consistent approach but also "enforce" consistency using our existing > pre-commit framework ? > > 2) Is generation in the __init__.py a good idea - if not what alternative > people would see? > > 3) Do others share my concern as well that imports of constants named the > same way (which we already see slipped through reviews a few times) is a > problem worth addressing by having differently named constants? > > Looking forward to feedback (and sorry for the long mail, but I tried to > add TL;DR and structure it in a way that is easy to digest and understand > problems and solutions proposed. > > J. > > [1] Generate constants with airflow versions in providers > https://github.com/apache/airflow/pull/44686 > [2] Prevent __init__.py in providers from being modified > https://github.com/apache/airflow/pull/44713 > [3] > https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#implementing-compatibility-for-provider-tests-for-older-airflow-versions > > [4] Example generated proposal: > https://github.com/apache/airflow/pull/44686/files#diff-bb74b7a9bbd568ea5d575410753d4e814cbaece5f42d84d96e8f7991a6c4ab95R41 > [5] Documentation in Provider's lifecycle docs > https://github.com/apache/airflow/pull/44686/files#diff-1ed6f13b432ea574158b9577dfb2a7ad8b88b0a09c45021e94b3c069b36a4408R436 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org