My two cents:

Preventing cross imports and modifying generated files sounds necessary.

On the other hand I agree that the CI is complex and having another step of
generated code just for simple versions checks might be overkill.

On Fri 6 Dec 2024 at 13:02, Jarek Potiuk <ja...@potiuk.com> wrote:

> > If the goal is to prevent cross-imports that aren’t listed: let’s do
> that! I made a quick POC here
> https://github.com/apache/airflow/pull/44686#issuecomment-2522904089
>
> That's an interesting proposal. Yes the massive wall of text was apparently
> needed in order to properly explain context and problems that I saw. I
> believe this is the "Apache Way" we are all committed to follow here, that
> I described at the last town hall: we describe an issue, make proposals,
> reach for feadback and get others to come with ideas how to solve the
> problems. And seems it worked -  thanks - that looks like constructive
> proposal on how to solve the problems I outlined (and allows to propose a
> solution that also solves consistency issue - so we likely will have a
> proposal to address all of the issues I listed).
>
> I like that proposal (I will have to take a look and see if it works). It
> does mean we will have even more pre-commits that are even more complex and
> use "waning this will change any time" feature of ruff, but if others are
> OK - I am good when we pair it with change that makes all usages consistent
> and described (and one that we agree to here).  We can always fix it later
> when ruff will change their "ruff analyze graph" feature output.
>
> I think in this case convention b)version_references.py in the providers
> where we keep those constants (same as in all providers) seems like a good
> idea - then the only thing someone will have to do when versioning provider
> they will have to copy that module and import the constants from there.
> Sounds like a good solution to the problems I described.
>
> WDYT Everyone?
>
> J.
>
> On Fri, Dec 6, 2024 at 12:19 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > To not bore you all with a massive wall of text: My tl;dr: sometimes
> > repeating a small block of code is the best solution! Lets not to more
> > magic, complexity, and auto-code generation than we absolutely have to.
> >
> > I don’t like the increasing complexity ratchet we seem to be on with
> > pre-commit and generated code and more and more rules and
> “hand-holding”.,
> > each change is only fractionally more complex, but when you take it in
> > aggregate it leads to the entire CI+publishing+docker
> > build+release+test+commit processing being by a clear margin the most
> > complex system I have ever seen. My comment was a reaction to this.
> >
> > The reason I objected so vehemently in this case: it feels to me like a
> > solution looking for a problem. The linked issues were about “someone
> made
> > a change to an autogenerated file that wasn’t picked up”, and
> > fixing/detecting that is fine.
> >
> > But somehow it morphed into “lets make a 600 line change, changing
> imports
> > and constant names to prevent against a problem that is very small and
> easy
> > to detect". In many ways this is like the old adage of “sometimes you
> have
> > a problem and reach for regex, and now you have two problems.” I view
> this
> > as needless complexity when we should be simplifying things! Being
> explicit
> > and having `AIRFLOW_VERSION = Version(airflow_version);
> AIRFLOW_V_3_0_PLUS
> > = Version(AIRFLOW_VERSION.base_version) >= Version("3.0.0”)` in a small
> > handful of places is a much easier to understand and work on that a field
> > in provider.yaml and then run pre-commit to generate some code.
> >
> > If the goal is to prevent cross-imports that aren’t listed: let’s do
> that!
> > I made a quick POC here
> > https://github.com/apache/airflow/pull/44686#issuecomment-2522904089
> >
> > > On 6 Dec 2024, at 06:29, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >
> > > Hello here,
> > >
> > > *TL;DR; Following recent discussions I am seeking an opinion and
> > consensus
> > > on the way we implement Airflow version checks in providers - ideally
> in
> > a
> > > consistent way, because currently we have a very messy approach where 5
> > > different ways are used in a way that makes it prone to errors.*
> > >
> > > Yesterday we had - I think a bit unnecessarily - heated discussion on a
> > > change I proposed to apply consistency in the Airflow version
> > compatibility
> > > check in providers. I proposed the change after some concerns were
> raised
> > > about using "common.compat" for this by Daniel (rightfully so IMHO) -
> > and I
> > > proposed a solution that was supposed to address all problems around
> the
> > > version check I was aware about in the PR [1] (Generate constants with
> > > airflow versions in providers).
> > >
> > > Somewhat surprisingly it raised quite strong objections (borderline
> > > rejection) and somewhat heated discussions - from both Ash and Daniel -
> > > while met somewhat enthusiastic approval from Kacper and Jens, and
> since
> > I
> > > respect and value everyone's opinion, I decided (following Daniel's
> > > suggestion) to separate the non-controversial part of my PR [2] (fixing
> > > __init__.py override problem that already happened for the second time
> > and
> > > caused problems when Elad attempted to release providers).
> > >
> > > While I hope [2] will be quickly approved as non-controversial, I am
> also
> > > seeking community wisdom and decision (consensus hopefully, vote if
> > needed)
> > > how we approach version checks because we seem to have different
> opinions
> > > there.
> > >
> > > Here is a bit more context:
> > >
> > > *Context*:
> > >
> > > In our providers that currently support Airflow 2.8.0+, we often have
> to
> > > have conditional code to support all the (4 currently) supported
> versions
> > > of Airflow - 2.8, 2.9, 2.10, 3.0 (in the near future also 2.11). This
> > > happens in main providers but also in the "test" code - because some of
> > our
> > > tests must also be compatible.
> > >
> > > Airflow version check for the test code is pretty consistent and well
> > > documented (with examples) [3] and generally well followed "by example"
> > and
> > > "docs". We use AIRFLOW_V_2_10_PLUS from
> "tests_common.test_utils.compat"
> > > module.
> > >
> > > *Problem:*
> > >
> > > However the version ched is currently done pretty inconsistently in
> > various
> > > providers. Generally version check is done by this snippet:
> > >
> > > from airflow import __version__ as AIRFLOW_VERSION
> > > AIRFLOW_V_2_10_PLUS = Version(Version(AIRFLOW_VERSION).base_version) >=
> > > Version("2.10.0")
> > >
> > > But it is used pretty inconsistently and with side-effects
> > >
> > > a) In some cases the code is imported from `common.compat` provider -
> > which
> > > means that there should be a dependency on common.compat from the
> > provider
> > > and in the future when we add new versions (2.11.0) we have to bump
> > version
> > > of the compat.provider as dependency.
> > >
> > > b) in other cases the above snippet is stored in some module (for
> example
> > > version_references.py in the providers and used from there in all
> > provider
> > > modules as
> > >
> > > c) in other cases the snippet is inlined several times in different
> > modules
> > > of the providers - usually at top level of the module
> > >
> > > d) in other cases the snipped produces differently named constant - for
> > > example _IS_AIRFLOW_2_10_OR_HIGHER
> > >
> > > e) in other cases the check is directly inlined:  if
> > > Version(Version(version).base_version) >= Version("3.0.0"):
> > >
> > > f) also in some cases the imports are done from wrong places. Many of
> > those
> > > top-level constants are named the same AIRFLOW_V_2_10_PLUS. Sometimes
> we
> > > have the same constant defined multiple times even in the same provider
> > > (and same name as the tests_compat one).
> > >
> > > g) in some cases (and that was the cause for release manager issues we
> > had
> > > and is going to be solved by [1]) some contributors did not realise
> > > __init__.py is generated (despite comment in the file) - and added the
> > > constants to the __init__.py file (this is very natural place for those
> > > constants, so I am not surprised)
> > >
> > > Even currently (but also in the past) it happened that the variables
> were
> > > imported from a wrong package - likely - I guess - because when you
> press
> > > "Alt-Enter" in IDE, it will import it from *somewhere*  by default -
> and
> > > since we have about 12 places where the constant is defined, it might
> > > accidentally introduce a dependency of providers on test code or
> another
> > > providers
> > >
> > > *Proposed solution*
> > >
> > > So - we have quite a bit of a mess here, and my PR [1] aimed to solve
> it
> > > and introduce consistency. There is no DRY (but IMHO that's ok - those
> > > snippets are small-enough to not have to be imported from common
> place) -
> > > but the consistency and wrong imports are a problem even now in the
> main
> > > branch and it might get worse.
> > >
> > > I believe that we all agree that consistency is generally good. I think
> > > (but I am interpreting the complaints of Ash and Daniel) that
> "enforcing"
> > > the consistency with pre-commit, is wrong and that it is enough to rely
> > on
> > > reviews (which already failed us as you can see).
> > >
> > > My idea with [2] (following our usual "automate consistency when you
> see
> > a
> > > problem) was that we could use what we already - have - generation of
> the
> > > __init__.py (which we used during release process so far and [2] adds
> it
> > > also to pre-commit) - and generate the constants in the way that will
> > make
> > > it difficult to import the wrong now. I proposed that for providers
> that
> > > need version variants we could generate those constants in provider's
> > > `__init__.py` :
> > >
> > > PROVIDERS_AMAZON_AIRFLOW_IS_2_10_PLUS = Version("2.10.0") <=
> > > _AIRFLOW_VERSION
> > >
> > > There are few more things generated - see [4] for example in my PR
> > > including explanation how it works in the generated code and [5]
> > > instruction on provider's lifecycle documented why and how we are doing
> > it
> > > [5] so that the contributors can understand and learn how to do it.
> > >
> > > This way it will be difficult to make the cross-import from different
> > > providers or test code, we can have a consistent approach, and it is
> easy
> > > to review and see mistakes.
> > >
> > > I am totally open to different naming conventions, and  even
> > unconditional
> > > generation of the version constants - currently it is controlled by a
> > > "need_version_variants" flag in provider.yaml - but we could also
> > generate
> > > it for all providers - always.
> > >
> > > I am looking for feedback on it - I hoped it could be done - as usual -
> > in
> > > the PR, but apparently Ash and Daniel feel very strongly about it and
> it
> > > seems that we need to agree in the devlist.
> > >
> > > *What do you think? Few questions:*
> > >
> > > 1) Is this a good idea to not only propose and describe a
> > > consistent approach but also "enforce" consistency using our existing
> > > pre-commit framework ?
> > >
> > > 2) Is generation in the __init__.py a good idea - if not what
> alternative
> > > people would see?
> > >
> > > 3) Do others share my concern as well that imports of constants named
> the
> > > same way (which we already see slipped through reviews a few times) is
> a
> > > problem worth addressing by having differently named constants?
> > >
> > > Looking forward to feedback (and sorry for the long mail, but I tried
> to
> > > add TL;DR and structure it in a way that is easy to digest and
> understand
> > > problems and solutions proposed.
> > >
> > > J.
> > >
> > > [1] Generate constants with airflow versions in providers
> > > https://github.com/apache/airflow/pull/44686
> > > [2] Prevent __init__.py in providers from being modified
> > > https://github.com/apache/airflow/pull/44713
> > > [3]
> > >
> >
> https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#implementing-compatibility-for-provider-tests-for-older-airflow-versions
> > >
> > > [4] Example generated proposal:
> > >
> >
> https://github.com/apache/airflow/pull/44686/files#diff-bb74b7a9bbd568ea5d575410753d4e814cbaece5f42d84d96e8f7991a6c4ab95R41
> > > [5] Documentation in Provider's lifecycle docs
> > >
> >
> https://github.com/apache/airflow/pull/44686/files#diff-1ed6f13b432ea574158b9577dfb2a7ad8b88b0a09c45021e94b3c069b36a4408R436
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>

Reply via email to