My two cents: Preventing cross imports and modifying generated files sounds necessary.
On the other hand I agree that the CI is complex and having another step of generated code just for simple versions checks might be overkill. On Fri 6 Dec 2024 at 13:02, Jarek Potiuk <ja...@potiuk.com> wrote: > > If the goal is to prevent cross-imports that aren’t listed: let’s do > that! I made a quick POC here > https://github.com/apache/airflow/pull/44686#issuecomment-2522904089 > > That's an interesting proposal. Yes the massive wall of text was apparently > needed in order to properly explain context and problems that I saw. I > believe this is the "Apache Way" we are all committed to follow here, that > I described at the last town hall: we describe an issue, make proposals, > reach for feadback and get others to come with ideas how to solve the > problems. And seems it worked - thanks - that looks like constructive > proposal on how to solve the problems I outlined (and allows to propose a > solution that also solves consistency issue - so we likely will have a > proposal to address all of the issues I listed). > > I like that proposal (I will have to take a look and see if it works). It > does mean we will have even more pre-commits that are even more complex and > use "waning this will change any time" feature of ruff, but if others are > OK - I am good when we pair it with change that makes all usages consistent > and described (and one that we agree to here). We can always fix it later > when ruff will change their "ruff analyze graph" feature output. > > I think in this case convention b)version_references.py in the providers > where we keep those constants (same as in all providers) seems like a good > idea - then the only thing someone will have to do when versioning provider > they will have to copy that module and import the constants from there. > Sounds like a good solution to the problems I described. > > WDYT Everyone? > > J. > > On Fri, Dec 6, 2024 at 12:19 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > To not bore you all with a massive wall of text: My tl;dr: sometimes > > repeating a small block of code is the best solution! Lets not to more > > magic, complexity, and auto-code generation than we absolutely have to. > > > > I don’t like the increasing complexity ratchet we seem to be on with > > pre-commit and generated code and more and more rules and > “hand-holding”., > > each change is only fractionally more complex, but when you take it in > > aggregate it leads to the entire CI+publishing+docker > > build+release+test+commit processing being by a clear margin the most > > complex system I have ever seen. My comment was a reaction to this. > > > > The reason I objected so vehemently in this case: it feels to me like a > > solution looking for a problem. The linked issues were about “someone > made > > a change to an autogenerated file that wasn’t picked up”, and > > fixing/detecting that is fine. > > > > But somehow it morphed into “lets make a 600 line change, changing > imports > > and constant names to prevent against a problem that is very small and > easy > > to detect". In many ways this is like the old adage of “sometimes you > have > > a problem and reach for regex, and now you have two problems.” I view > this > > as needless complexity when we should be simplifying things! Being > explicit > > and having `AIRFLOW_VERSION = Version(airflow_version); > AIRFLOW_V_3_0_PLUS > > = Version(AIRFLOW_VERSION.base_version) >= Version("3.0.0”)` in a small > > handful of places is a much easier to understand and work on that a field > > in provider.yaml and then run pre-commit to generate some code. > > > > If the goal is to prevent cross-imports that aren’t listed: let’s do > that! > > I made a quick POC here > > https://github.com/apache/airflow/pull/44686#issuecomment-2522904089 > > > > > On 6 Dec 2024, at 06:29, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > Hello here, > > > > > > *TL;DR; Following recent discussions I am seeking an opinion and > > consensus > > > on the way we implement Airflow version checks in providers - ideally > in > > a > > > consistent way, because currently we have a very messy approach where 5 > > > different ways are used in a way that makes it prone to errors.* > > > > > > Yesterday we had - I think a bit unnecessarily - heated discussion on a > > > change I proposed to apply consistency in the Airflow version > > compatibility > > > check in providers. I proposed the change after some concerns were > raised > > > about using "common.compat" for this by Daniel (rightfully so IMHO) - > > and I > > > proposed a solution that was supposed to address all problems around > the > > > version check I was aware about in the PR [1] (Generate constants with > > > airflow versions in providers). > > > > > > Somewhat surprisingly it raised quite strong objections (borderline > > > rejection) and somewhat heated discussions - from both Ash and Daniel - > > > while met somewhat enthusiastic approval from Kacper and Jens, and > since > > I > > > respect and value everyone's opinion, I decided (following Daniel's > > > suggestion) to separate the non-controversial part of my PR [2] (fixing > > > __init__.py override problem that already happened for the second time > > and > > > caused problems when Elad attempted to release providers). > > > > > > While I hope [2] will be quickly approved as non-controversial, I am > also > > > seeking community wisdom and decision (consensus hopefully, vote if > > needed) > > > how we approach version checks because we seem to have different > opinions > > > there. > > > > > > Here is a bit more context: > > > > > > *Context*: > > > > > > In our providers that currently support Airflow 2.8.0+, we often have > to > > > have conditional code to support all the (4 currently) supported > versions > > > of Airflow - 2.8, 2.9, 2.10, 3.0 (in the near future also 2.11). This > > > happens in main providers but also in the "test" code - because some of > > our > > > tests must also be compatible. > > > > > > Airflow version check for the test code is pretty consistent and well > > > documented (with examples) [3] and generally well followed "by example" > > and > > > "docs". We use AIRFLOW_V_2_10_PLUS from > "tests_common.test_utils.compat" > > > module. > > > > > > *Problem:* > > > > > > However the version ched is currently done pretty inconsistently in > > various > > > providers. Generally version check is done by this snippet: > > > > > > from airflow import __version__ as AIRFLOW_VERSION > > > AIRFLOW_V_2_10_PLUS = Version(Version(AIRFLOW_VERSION).base_version) >= > > > Version("2.10.0") > > > > > > But it is used pretty inconsistently and with side-effects > > > > > > a) In some cases the code is imported from `common.compat` provider - > > which > > > means that there should be a dependency on common.compat from the > > provider > > > and in the future when we add new versions (2.11.0) we have to bump > > version > > > of the compat.provider as dependency. > > > > > > b) in other cases the above snippet is stored in some module (for > example > > > version_references.py in the providers and used from there in all > > provider > > > modules as > > > > > > c) in other cases the snippet is inlined several times in different > > modules > > > of the providers - usually at top level of the module > > > > > > d) in other cases the snipped produces differently named constant - for > > > example _IS_AIRFLOW_2_10_OR_HIGHER > > > > > > e) in other cases the check is directly inlined: if > > > Version(Version(version).base_version) >= Version("3.0.0"): > > > > > > f) also in some cases the imports are done from wrong places. Many of > > those > > > top-level constants are named the same AIRFLOW_V_2_10_PLUS. Sometimes > we > > > have the same constant defined multiple times even in the same provider > > > (and same name as the tests_compat one). > > > > > > g) in some cases (and that was the cause for release manager issues we > > had > > > and is going to be solved by [1]) some contributors did not realise > > > __init__.py is generated (despite comment in the file) - and added the > > > constants to the __init__.py file (this is very natural place for those > > > constants, so I am not surprised) > > > > > > Even currently (but also in the past) it happened that the variables > were > > > imported from a wrong package - likely - I guess - because when you > press > > > "Alt-Enter" in IDE, it will import it from *somewhere* by default - > and > > > since we have about 12 places where the constant is defined, it might > > > accidentally introduce a dependency of providers on test code or > another > > > providers > > > > > > *Proposed solution* > > > > > > So - we have quite a bit of a mess here, and my PR [1] aimed to solve > it > > > and introduce consistency. There is no DRY (but IMHO that's ok - those > > > snippets are small-enough to not have to be imported from common > place) - > > > but the consistency and wrong imports are a problem even now in the > main > > > branch and it might get worse. > > > > > > I believe that we all agree that consistency is generally good. I think > > > (but I am interpreting the complaints of Ash and Daniel) that > "enforcing" > > > the consistency with pre-commit, is wrong and that it is enough to rely > > on > > > reviews (which already failed us as you can see). > > > > > > My idea with [2] (following our usual "automate consistency when you > see > > a > > > problem) was that we could use what we already - have - generation of > the > > > __init__.py (which we used during release process so far and [2] adds > it > > > also to pre-commit) - and generate the constants in the way that will > > make > > > it difficult to import the wrong now. I proposed that for providers > that > > > need version variants we could generate those constants in provider's > > > `__init__.py` : > > > > > > PROVIDERS_AMAZON_AIRFLOW_IS_2_10_PLUS = Version("2.10.0") <= > > > _AIRFLOW_VERSION > > > > > > There are few more things generated - see [4] for example in my PR > > > including explanation how it works in the generated code and [5] > > > instruction on provider's lifecycle documented why and how we are doing > > it > > > [5] so that the contributors can understand and learn how to do it. > > > > > > This way it will be difficult to make the cross-import from different > > > providers or test code, we can have a consistent approach, and it is > easy > > > to review and see mistakes. > > > > > > I am totally open to different naming conventions, and even > > unconditional > > > generation of the version constants - currently it is controlled by a > > > "need_version_variants" flag in provider.yaml - but we could also > > generate > > > it for all providers - always. > > > > > > I am looking for feedback on it - I hoped it could be done - as usual - > > in > > > the PR, but apparently Ash and Daniel feel very strongly about it and > it > > > seems that we need to agree in the devlist. > > > > > > *What do you think? Few questions:* > > > > > > 1) Is this a good idea to not only propose and describe a > > > consistent approach but also "enforce" consistency using our existing > > > pre-commit framework ? > > > > > > 2) Is generation in the __init__.py a good idea - if not what > alternative > > > people would see? > > > > > > 3) Do others share my concern as well that imports of constants named > the > > > same way (which we already see slipped through reviews a few times) is > a > > > problem worth addressing by having differently named constants? > > > > > > Looking forward to feedback (and sorry for the long mail, but I tried > to > > > add TL;DR and structure it in a way that is easy to digest and > understand > > > problems and solutions proposed. > > > > > > J. > > > > > > [1] Generate constants with airflow versions in providers > > > https://github.com/apache/airflow/pull/44686 > > > [2] Prevent __init__.py in providers from being modified > > > https://github.com/apache/airflow/pull/44713 > > > [3] > > > > > > https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#implementing-compatibility-for-provider-tests-for-older-airflow-versions > > > > > > [4] Example generated proposal: > > > > > > https://github.com/apache/airflow/pull/44686/files#diff-bb74b7a9bbd568ea5d575410753d4e814cbaece5f42d84d96e8f7991a6c4ab95R41 > > > [5] Documentation in Provider's lifecycle docs > > > > > > https://github.com/apache/airflow/pull/44686/files#diff-1ed6f13b432ea574158b9577dfb2a7ad8b88b0a09c45021e94b3c069b36a4408R436 > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > >