> If the goal is to prevent cross-imports that aren’t listed: let’s do that! I made a quick POC here https://github.com/apache/airflow/pull/44686#issuecomment-2522904089
That's an interesting proposal. Yes the massive wall of text was apparently needed in order to properly explain context and problems that I saw. I believe this is the "Apache Way" we are all committed to follow here, that I described at the last town hall: we describe an issue, make proposals, reach for feadback and get others to come with ideas how to solve the problems. And seems it worked - thanks - that looks like constructive proposal on how to solve the problems I outlined (and allows to propose a solution that also solves consistency issue - so we likely will have a proposal to address all of the issues I listed). I like that proposal (I will have to take a look and see if it works). It does mean we will have even more pre-commits that are even more complex and use "waning this will change any time" feature of ruff, but if others are OK - I am good when we pair it with change that makes all usages consistent and described (and one that we agree to here). We can always fix it later when ruff will change their "ruff analyze graph" feature output. I think in this case convention b)version_references.py in the providers where we keep those constants (same as in all providers) seems like a good idea - then the only thing someone will have to do when versioning provider they will have to copy that module and import the constants from there. Sounds like a good solution to the problems I described. WDYT Everyone? J. On Fri, Dec 6, 2024 at 12:19 PM Ash Berlin-Taylor <a...@apache.org> wrote: > To not bore you all with a massive wall of text: My tl;dr: sometimes > repeating a small block of code is the best solution! Lets not to more > magic, complexity, and auto-code generation than we absolutely have to. > > I don’t like the increasing complexity ratchet we seem to be on with > pre-commit and generated code and more and more rules and “hand-holding”., > each change is only fractionally more complex, but when you take it in > aggregate it leads to the entire CI+publishing+docker > build+release+test+commit processing being by a clear margin the most > complex system I have ever seen. My comment was a reaction to this. > > The reason I objected so vehemently in this case: it feels to me like a > solution looking for a problem. The linked issues were about “someone made > a change to an autogenerated file that wasn’t picked up”, and > fixing/detecting that is fine. > > But somehow it morphed into “lets make a 600 line change, changing imports > and constant names to prevent against a problem that is very small and easy > to detect". In many ways this is like the old adage of “sometimes you have > a problem and reach for regex, and now you have two problems.” I view this > as needless complexity when we should be simplifying things! Being explicit > and having `AIRFLOW_VERSION = Version(airflow_version); AIRFLOW_V_3_0_PLUS > = Version(AIRFLOW_VERSION.base_version) >= Version("3.0.0”)` in a small > handful of places is a much easier to understand and work on that a field > in provider.yaml and then run pre-commit to generate some code. > > If the goal is to prevent cross-imports that aren’t listed: let’s do that! > I made a quick POC here > https://github.com/apache/airflow/pull/44686#issuecomment-2522904089 > > > On 6 Dec 2024, at 06:29, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > Hello here, > > > > *TL;DR; Following recent discussions I am seeking an opinion and > consensus > > on the way we implement Airflow version checks in providers - ideally in > a > > consistent way, because currently we have a very messy approach where 5 > > different ways are used in a way that makes it prone to errors.* > > > > Yesterday we had - I think a bit unnecessarily - heated discussion on a > > change I proposed to apply consistency in the Airflow version > compatibility > > check in providers. I proposed the change after some concerns were raised > > about using "common.compat" for this by Daniel (rightfully so IMHO) - > and I > > proposed a solution that was supposed to address all problems around the > > version check I was aware about in the PR [1] (Generate constants with > > airflow versions in providers). > > > > Somewhat surprisingly it raised quite strong objections (borderline > > rejection) and somewhat heated discussions - from both Ash and Daniel - > > while met somewhat enthusiastic approval from Kacper and Jens, and since > I > > respect and value everyone's opinion, I decided (following Daniel's > > suggestion) to separate the non-controversial part of my PR [2] (fixing > > __init__.py override problem that already happened for the second time > and > > caused problems when Elad attempted to release providers). > > > > While I hope [2] will be quickly approved as non-controversial, I am also > > seeking community wisdom and decision (consensus hopefully, vote if > needed) > > how we approach version checks because we seem to have different opinions > > there. > > > > Here is a bit more context: > > > > *Context*: > > > > In our providers that currently support Airflow 2.8.0+, we often have to > > have conditional code to support all the (4 currently) supported versions > > of Airflow - 2.8, 2.9, 2.10, 3.0 (in the near future also 2.11). This > > happens in main providers but also in the "test" code - because some of > our > > tests must also be compatible. > > > > Airflow version check for the test code is pretty consistent and well > > documented (with examples) [3] and generally well followed "by example" > and > > "docs". We use AIRFLOW_V_2_10_PLUS from "tests_common.test_utils.compat" > > module. > > > > *Problem:* > > > > However the version ched is currently done pretty inconsistently in > various > > providers. Generally version check is done by this snippet: > > > > from airflow import __version__ as AIRFLOW_VERSION > > AIRFLOW_V_2_10_PLUS = Version(Version(AIRFLOW_VERSION).base_version) >= > > Version("2.10.0") > > > > But it is used pretty inconsistently and with side-effects > > > > a) In some cases the code is imported from `common.compat` provider - > which > > means that there should be a dependency on common.compat from the > provider > > and in the future when we add new versions (2.11.0) we have to bump > version > > of the compat.provider as dependency. > > > > b) in other cases the above snippet is stored in some module (for example > > version_references.py in the providers and used from there in all > provider > > modules as > > > > c) in other cases the snippet is inlined several times in different > modules > > of the providers - usually at top level of the module > > > > d) in other cases the snipped produces differently named constant - for > > example _IS_AIRFLOW_2_10_OR_HIGHER > > > > e) in other cases the check is directly inlined: if > > Version(Version(version).base_version) >= Version("3.0.0"): > > > > f) also in some cases the imports are done from wrong places. Many of > those > > top-level constants are named the same AIRFLOW_V_2_10_PLUS. Sometimes we > > have the same constant defined multiple times even in the same provider > > (and same name as the tests_compat one). > > > > g) in some cases (and that was the cause for release manager issues we > had > > and is going to be solved by [1]) some contributors did not realise > > __init__.py is generated (despite comment in the file) - and added the > > constants to the __init__.py file (this is very natural place for those > > constants, so I am not surprised) > > > > Even currently (but also in the past) it happened that the variables were > > imported from a wrong package - likely - I guess - because when you press > > "Alt-Enter" in IDE, it will import it from *somewhere* by default - and > > since we have about 12 places where the constant is defined, it might > > accidentally introduce a dependency of providers on test code or another > > providers > > > > *Proposed solution* > > > > So - we have quite a bit of a mess here, and my PR [1] aimed to solve it > > and introduce consistency. There is no DRY (but IMHO that's ok - those > > snippets are small-enough to not have to be imported from common place) - > > but the consistency and wrong imports are a problem even now in the main > > branch and it might get worse. > > > > I believe that we all agree that consistency is generally good. I think > > (but I am interpreting the complaints of Ash and Daniel) that "enforcing" > > the consistency with pre-commit, is wrong and that it is enough to rely > on > > reviews (which already failed us as you can see). > > > > My idea with [2] (following our usual "automate consistency when you see > a > > problem) was that we could use what we already - have - generation of the > > __init__.py (which we used during release process so far and [2] adds it > > also to pre-commit) - and generate the constants in the way that will > make > > it difficult to import the wrong now. I proposed that for providers that > > need version variants we could generate those constants in provider's > > `__init__.py` : > > > > PROVIDERS_AMAZON_AIRFLOW_IS_2_10_PLUS = Version("2.10.0") <= > > _AIRFLOW_VERSION > > > > There are few more things generated - see [4] for example in my PR > > including explanation how it works in the generated code and [5] > > instruction on provider's lifecycle documented why and how we are doing > it > > [5] so that the contributors can understand and learn how to do it. > > > > This way it will be difficult to make the cross-import from different > > providers or test code, we can have a consistent approach, and it is easy > > to review and see mistakes. > > > > I am totally open to different naming conventions, and even > unconditional > > generation of the version constants - currently it is controlled by a > > "need_version_variants" flag in provider.yaml - but we could also > generate > > it for all providers - always. > > > > I am looking for feedback on it - I hoped it could be done - as usual - > in > > the PR, but apparently Ash and Daniel feel very strongly about it and it > > seems that we need to agree in the devlist. > > > > *What do you think? Few questions:* > > > > 1) Is this a good idea to not only propose and describe a > > consistent approach but also "enforce" consistency using our existing > > pre-commit framework ? > > > > 2) Is generation in the __init__.py a good idea - if not what alternative > > people would see? > > > > 3) Do others share my concern as well that imports of constants named the > > same way (which we already see slipped through reviews a few times) is a > > problem worth addressing by having differently named constants? > > > > Looking forward to feedback (and sorry for the long mail, but I tried to > > add TL;DR and structure it in a way that is easy to digest and understand > > problems and solutions proposed. > > > > J. > > > > [1] Generate constants with airflow versions in providers > > https://github.com/apache/airflow/pull/44686 > > [2] Prevent __init__.py in providers from being modified > > https://github.com/apache/airflow/pull/44713 > > [3] > > > https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#implementing-compatibility-for-provider-tests-for-older-airflow-versions > > > > [4] Example generated proposal: > > > https://github.com/apache/airflow/pull/44686/files#diff-bb74b7a9bbd568ea5d575410753d4e814cbaece5f42d84d96e8f7991a6c4ab95R41 > > [5] Documentation in Provider's lifecycle docs > > > https://github.com/apache/airflow/pull/44686/files#diff-1ed6f13b432ea574158b9577dfb2a7ad8b88b0a09c45021e94b3c069b36a4408R436 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >