I agree with all the points. Yes, we definitely need a mechanism to
verify imports and identify the source of imports.
Validating these will help in the release process smoothly. Recently,
we faced an issue, as Jarek mentioned,
where Elad tried to release providers and suddenly there were
mismatched templates and caused CI failure,
it was one template file issue. if we have more issues raise part of
release then this will be difficult to fix and
making sure all checks work as part of release IMHO.

If Ash's suggested solution resolves the import references check that
we're trying to solve, then we are good.

Pavan .


On Fri, Dec 6, 2024 at 4:07 PM Pierre Jeambrun <pierrejb...@gmail.com> wrote:
>
> My two cents:
>
> Preventing cross imports and modifying generated files sounds necessary.
>
> On the other hand I agree that the CI is complex and having another step of
> generated code just for simple versions checks might be overkill.
>
> On Fri 6 Dec 2024 at 13:02, Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > > If the goal is to prevent cross-imports that aren’t listed: let’s do
> > that! I made a quick POC here
> > https://github.com/apache/airflow/pull/44686#issuecomment-2522904089
> >
> > That's an interesting proposal. Yes the massive wall of text was apparently
> > needed in order to properly explain context and problems that I saw. I
> > believe this is the "Apache Way" we are all committed to follow here, that
> > I described at the last town hall: we describe an issue, make proposals,
> > reach for feadback and get others to come with ideas how to solve the
> > problems. And seems it worked -  thanks - that looks like constructive
> > proposal on how to solve the problems I outlined (and allows to propose a
> > solution that also solves consistency issue - so we likely will have a
> > proposal to address all of the issues I listed).
> >
> > I like that proposal (I will have to take a look and see if it works). It
> > does mean we will have even more pre-commits that are even more complex and
> > use "waning this will change any time" feature of ruff, but if others are
> > OK - I am good when we pair it with change that makes all usages consistent
> > and described (and one that we agree to here).  We can always fix it later
> > when ruff will change their "ruff analyze graph" feature output.
> >
> > I think in this case convention b)version_references.py in the providers
> > where we keep those constants (same as in all providers) seems like a good
> > idea - then the only thing someone will have to do when versioning provider
> > they will have to copy that module and import the constants from there.
> > Sounds like a good solution to the problems I described.
> >
> > WDYT Everyone?
> >
> > J.
> >
> > On Fri, Dec 6, 2024 at 12:19 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> >
> > > To not bore you all with a massive wall of text: My tl;dr: sometimes
> > > repeating a small block of code is the best solution! Lets not to more
> > > magic, complexity, and auto-code generation than we absolutely have to.
> > >
> > > I don’t like the increasing complexity ratchet we seem to be on with
> > > pre-commit and generated code and more and more rules and
> > “hand-holding”.,
> > > each change is only fractionally more complex, but when you take it in
> > > aggregate it leads to the entire CI+publishing+docker
> > > build+release+test+commit processing being by a clear margin the most
> > > complex system I have ever seen. My comment was a reaction to this.
> > >
> > > The reason I objected so vehemently in this case: it feels to me like a
> > > solution looking for a problem. The linked issues were about “someone
> > made
> > > a change to an autogenerated file that wasn’t picked up”, and
> > > fixing/detecting that is fine.
> > >
> > > But somehow it morphed into “lets make a 600 line change, changing
> > imports
> > > and constant names to prevent against a problem that is very small and
> > easy
> > > to detect". In many ways this is like the old adage of “sometimes you
> > have
> > > a problem and reach for regex, and now you have two problems.” I view
> > this
> > > as needless complexity when we should be simplifying things! Being
> > explicit
> > > and having `AIRFLOW_VERSION = Version(airflow_version);
> > AIRFLOW_V_3_0_PLUS
> > > = Version(AIRFLOW_VERSION.base_version) >= Version("3.0.0”)` in a small
> > > handful of places is a much easier to understand and work on that a field
> > > in provider.yaml and then run pre-commit to generate some code.
> > >
> > > If the goal is to prevent cross-imports that aren’t listed: let’s do
> > that!
> > > I made a quick POC here
> > > https://github.com/apache/airflow/pull/44686#issuecomment-2522904089
> > >
> > > > On 6 Dec 2024, at 06:29, Jarek Potiuk <ja...@potiuk.com> wrote:
> > > >
> > > > Hello here,
> > > >
> > > > *TL;DR; Following recent discussions I am seeking an opinion and
> > > consensus
> > > > on the way we implement Airflow version checks in providers - ideally
> > in
> > > a
> > > > consistent way, because currently we have a very messy approach where 5
> > > > different ways are used in a way that makes it prone to errors.*
> > > >
> > > > Yesterday we had - I think a bit unnecessarily - heated discussion on a
> > > > change I proposed to apply consistency in the Airflow version
> > > compatibility
> > > > check in providers. I proposed the change after some concerns were
> > raised
> > > > about using "common.compat" for this by Daniel (rightfully so IMHO) -
> > > and I
> > > > proposed a solution that was supposed to address all problems around
> > the
> > > > version check I was aware about in the PR [1] (Generate constants with
> > > > airflow versions in providers).
> > > >
> > > > Somewhat surprisingly it raised quite strong objections (borderline
> > > > rejection) and somewhat heated discussions - from both Ash and Daniel -
> > > > while met somewhat enthusiastic approval from Kacper and Jens, and
> > since
> > > I
> > > > respect and value everyone's opinion, I decided (following Daniel's
> > > > suggestion) to separate the non-controversial part of my PR [2] (fixing
> > > > __init__.py override problem that already happened for the second time
> > > and
> > > > caused problems when Elad attempted to release providers).
> > > >
> > > > While I hope [2] will be quickly approved as non-controversial, I am
> > also
> > > > seeking community wisdom and decision (consensus hopefully, vote if
> > > needed)
> > > > how we approach version checks because we seem to have different
> > opinions
> > > > there.
> > > >
> > > > Here is a bit more context:
> > > >
> > > > *Context*:
> > > >
> > > > In our providers that currently support Airflow 2.8.0+, we often have
> > to
> > > > have conditional code to support all the (4 currently) supported
> > versions
> > > > of Airflow - 2.8, 2.9, 2.10, 3.0 (in the near future also 2.11). This
> > > > happens in main providers but also in the "test" code - because some of
> > > our
> > > > tests must also be compatible.
> > > >
> > > > Airflow version check for the test code is pretty consistent and well
> > > > documented (with examples) [3] and generally well followed "by example"
> > > and
> > > > "docs". We use AIRFLOW_V_2_10_PLUS from
> > "tests_common.test_utils.compat"
> > > > module.
> > > >
> > > > *Problem:*
> > > >
> > > > However the version ched is currently done pretty inconsistently in
> > > various
> > > > providers. Generally version check is done by this snippet:
> > > >
> > > > from airflow import __version__ as AIRFLOW_VERSION
> > > > AIRFLOW_V_2_10_PLUS = Version(Version(AIRFLOW_VERSION).base_version) >=
> > > > Version("2.10.0")
> > > >
> > > > But it is used pretty inconsistently and with side-effects
> > > >
> > > > a) In some cases the code is imported from `common.compat` provider -
> > > which
> > > > means that there should be a dependency on common.compat from the
> > > provider
> > > > and in the future when we add new versions (2.11.0) we have to bump
> > > version
> > > > of the compat.provider as dependency.
> > > >
> > > > b) in other cases the above snippet is stored in some module (for
> > example
> > > > version_references.py in the providers and used from there in all
> > > provider
> > > > modules as
> > > >
> > > > c) in other cases the snippet is inlined several times in different
> > > modules
> > > > of the providers - usually at top level of the module
> > > >
> > > > d) in other cases the snipped produces differently named constant - for
> > > > example _IS_AIRFLOW_2_10_OR_HIGHER
> > > >
> > > > e) in other cases the check is directly inlined:  if
> > > > Version(Version(version).base_version) >= Version("3.0.0"):
> > > >
> > > > f) also in some cases the imports are done from wrong places. Many of
> > > those
> > > > top-level constants are named the same AIRFLOW_V_2_10_PLUS. Sometimes
> > we
> > > > have the same constant defined multiple times even in the same provider
> > > > (and same name as the tests_compat one).
> > > >
> > > > g) in some cases (and that was the cause for release manager issues we
> > > had
> > > > and is going to be solved by [1]) some contributors did not realise
> > > > __init__.py is generated (despite comment in the file) - and added the
> > > > constants to the __init__.py file (this is very natural place for those
> > > > constants, so I am not surprised)
> > > >
> > > > Even currently (but also in the past) it happened that the variables
> > were
> > > > imported from a wrong package - likely - I guess - because when you
> > press
> > > > "Alt-Enter" in IDE, it will import it from *somewhere*  by default -
> > and
> > > > since we have about 12 places where the constant is defined, it might
> > > > accidentally introduce a dependency of providers on test code or
> > another
> > > > providers
> > > >
> > > > *Proposed solution*
> > > >
> > > > So - we have quite a bit of a mess here, and my PR [1] aimed to solve
> > it
> > > > and introduce consistency. There is no DRY (but IMHO that's ok - those
> > > > snippets are small-enough to not have to be imported from common
> > place) -
> > > > but the consistency and wrong imports are a problem even now in the
> > main
> > > > branch and it might get worse.
> > > >
> > > > I believe that we all agree that consistency is generally good. I think
> > > > (but I am interpreting the complaints of Ash and Daniel) that
> > "enforcing"
> > > > the consistency with pre-commit, is wrong and that it is enough to rely
> > > on
> > > > reviews (which already failed us as you can see).
> > > >
> > > > My idea with [2] (following our usual "automate consistency when you
> > see
> > > a
> > > > problem) was that we could use what we already - have - generation of
> > the
> > > > __init__.py (which we used during release process so far and [2] adds
> > it
> > > > also to pre-commit) - and generate the constants in the way that will
> > > make
> > > > it difficult to import the wrong now. I proposed that for providers
> > that
> > > > need version variants we could generate those constants in provider's
> > > > `__init__.py` :
> > > >
> > > > PROVIDERS_AMAZON_AIRFLOW_IS_2_10_PLUS = Version("2.10.0") <=
> > > > _AIRFLOW_VERSION
> > > >
> > > > There are few more things generated - see [4] for example in my PR
> > > > including explanation how it works in the generated code and [5]
> > > > instruction on provider's lifecycle documented why and how we are doing
> > > it
> > > > [5] so that the contributors can understand and learn how to do it.
> > > >
> > > > This way it will be difficult to make the cross-import from different
> > > > providers or test code, we can have a consistent approach, and it is
> > easy
> > > > to review and see mistakes.
> > > >
> > > > I am totally open to different naming conventions, and  even
> > > unconditional
> > > > generation of the version constants - currently it is controlled by a
> > > > "need_version_variants" flag in provider.yaml - but we could also
> > > generate
> > > > it for all providers - always.
> > > >
> > > > I am looking for feedback on it - I hoped it could be done - as usual -
> > > in
> > > > the PR, but apparently Ash and Daniel feel very strongly about it and
> > it
> > > > seems that we need to agree in the devlist.
> > > >
> > > > *What do you think? Few questions:*
> > > >
> > > > 1) Is this a good idea to not only propose and describe a
> > > > consistent approach but also "enforce" consistency using our existing
> > > > pre-commit framework ?
> > > >
> > > > 2) Is generation in the __init__.py a good idea - if not what
> > alternative
> > > > people would see?
> > > >
> > > > 3) Do others share my concern as well that imports of constants named
> > the
> > > > same way (which we already see slipped through reviews a few times) is
> > a
> > > > problem worth addressing by having differently named constants?
> > > >
> > > > Looking forward to feedback (and sorry for the long mail, but I tried
> > to
> > > > add TL;DR and structure it in a way that is easy to digest and
> > understand
> > > > problems and solutions proposed.
> > > >
> > > > J.
> > > >
> > > > [1] Generate constants with airflow versions in providers
> > > > https://github.com/apache/airflow/pull/44686
> > > > [2] Prevent __init__.py in providers from being modified
> > > > https://github.com/apache/airflow/pull/44713
> > > > [3]
> > > >
> > >
> > https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#implementing-compatibility-for-provider-tests-for-older-airflow-versions
> > > >
> > > > [4] Example generated proposal:
> > > >
> > >
> > https://github.com/apache/airflow/pull/44686/files#diff-bb74b7a9bbd568ea5d575410753d4e814cbaece5f42d84d96e8f7991a6c4ab95R41
> > > > [5] Documentation in Provider's lifecycle docs
> > > >
> > >
> > https://github.com/apache/airflow/pull/44686/files#diff-1ed6f13b432ea574158b9577dfb2a7ad8b88b0a09c45021e94b3c069b36a4408R436
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to