I have an idea about that one, and probably that one will fulfill the
"polyfill" approach discussed earlier.

I think we should not name the provider "common.util" but "common.compat" -
because all the code that we need to put there is really about keeping
compatibility.

For example look here https://github.com/apache/airflow/pull/39530

We have a need to have a "compatibility" code somewhere that a number of
providers could use in case we want to keep some backwards compatibility.

So having a "common.compat" provider would likely nicely full-fill the
polyfill approach - It should only contain the code that we aim to keep
backwards compatibility

Example for https://github.com/apache/airflow/pull/39530

* we add the complex compatibility code (see
https://github.com/apache/airflow/pull/39530#issuecomment-2145670785) in
the "common.compat" provider - and to airflow.openlineage in this case
* we import it from there in all providers that need it (this will
automatically add dependency)
* when providers get >= airflow 2.10 - we change them to import from
`airflow.openlineage` rather than from "airflow.providers.common.compat".

We could apply similar approach for other "compatibility" code

J.




On Thu, Apr 11, 2024 at 10:22 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Any other  ideas or suggestions here? Can someone explain how the
> "polypill" approach would look like, maybe? How do we imagine this working?
>
> Just to continue this discussion - another example.
>
> Small thing that David wanted to add for changes in some sql providers:
>
> @contextmanager
> def suppress_and_warn(*exceptions: type[BaseException]):
>     """Context manager that suppresses the given exceptions and logs a
> warning message."""
>     try:
>         yield
>     except exceptions as e:
>         warnings.warn(f"Exception suppressed:
> {e}\n{traceback.format_exc()}", category=UserWarning)
>
>
>
> https://github.com/apache/airflow/pull/38707/files#diff-6e1b2f961cb951d05d66d2d814ef5f6d8f8bf8f43c40fb5d40e27a031fed8dd7R115
>
> This is a small thing - but adding it in `airflow` is problematic -
> because it will only be released in 1.10, so we cannot use it in providers
> if we do.
> Currently - since it is used in sql providers, I suggested using
> `common.sql` for that code (and add >= 1.12 for common-sql-providers for
> those providers that use it). And I will write a separate email about a
> proposed versioning approach there.
>
> Do we have a good proposal on how we can solve similar things in the
> future?
> Do we want it at all? It has some challenges - yes it DRY's the code but
> it also introduces coupling.
>
> J.
>
>
> On Sun, Mar 10, 2024 at 6:21 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Coming back to it - what about the "polypill" :)? What's different vs the
>> "common.sql" way of doing it ? How do we think it can work ?
>>
>> On Thu, Feb 22, 2024 at 1:58 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> > The symbolic link approach seems to disregard all the external
>>> providers, unless I misunderstand it.
>>>
>>> Not really. It just does not make it easy for the external providers to
>>> use it "fast".  They can still - if they want to just manually copy those
>>> utils from the latest version of Airflow if they want to use it. Almost by
>>> definition, those will be small, independent modules that can be simply
>>> copied as needed by whoever releases external providers - and they are also
>>> free to copy any older version if they want. That is a nice feature that
>>> makes them fully decoupled from the version of Airflow they are installed
>>> in (same as community providers). Or - if they want they can just import
>>> them from "airflow.provider_utils" - but then they have to add >= Airflow
>>> 2.9 if that util appeared in Airflow 2.9 (which is the main reason we want
>>> to use symbolic links - because due to our policies and promises, we do not
>>> want community providers to depend on latest version of Airflow in vast
>>> majority of cases.
>>>
>>> So this approach is also fully usable by external providers, but it
>>> requires some manual effort to copy the modules to their providers.
>>>
>>> > I like the polypill idea. A backport provider that brings new
>>> interfaces to providers without the actual functionalities.
>>>
>>> I would love to hear more about this, I think the "common.util" was
>>> exactly the kind of polyfill approach (with its own versioning
>>> complexities) but maybe I do not understand how such a polyfill provider
>>> would work. Say we want to add a new "urlparse" method usable for all
>>> providers. Could you explain how it would work - say:
>>>
>>> * we add "urlparse" in Airflow 2.9
>>> * some provider wants to use it in Airflow 2.7
>>>
>>> What providers, with what code/interfaces we would have to release in
>>> this case and what dependencies such providers that want to use it (both
>>> community and Airflow should have)? I **think** that would mean exactly the
>>> "common.<something>" approach we already have with "io" and "sql", but
>>> maybe I do not understand it :)
>>>
>>> On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <t...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> I like the polypill idea. A backport provider that brings new
>>>> interfaces to providers without the actual functionalities.
>>>>
>>>>
>>>> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mobuchow...@apache.org>
>>>> wrote:
>>>> >
>>>> >> That's why I generally do
>>>> > not like the "util" approach because common packaging introduces
>>>> > unnecessary coupling (you have to upgrade independent utils together).
>>>> >
>>>> > From my experience with releasing OpenLineage where we do things
>>>> similarly:
>>>> > I think that's the advantage of it, but only _if_ you can release
>>>> those
>>>> > together.
>>>> > With "build-in" providers it makes sense, but could be burdensome if
>>>> > "external"
>>>> > ones would depend on that functionality.
>>>> >
>>>> >> I know it's not been the original idea behind providers, but - after
>>>> > testing common.sql and now also having common.io, seems like more
>>>> and more
>>>> > we would like to extract some common code that we would like
>>>> providers to
>>>> > use, but we refrain from it, because it will only be actually usable 6
>>>> > months after we introduce some common code.
>>>> >
>>>> > So, maybe better approach would be to introduce the functionality into
>>>> > core,
>>>> > and use common.X provider as "polyfill" (to borrow JS nomenclature)
>>>> > to make sure providers could use that functionality now, where
>>>> external
>>>> > ones could depend on the Airflow ones?
>>>> >
>>>> > The symbolic link approach seems to disregard all the external
>>>> providers,
>>>> > unless
>>>> > I misunderstand it.
>>>> >
>>>> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>>>> >
>>>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>>>> Storage,
>>>> >> reading SQL, etc.) should each have its own utility package, so they
>>>> can be
>>>> >> released independently without dependency problems popping up if we
>>>> need to
>>>> >> break compatibility to one purpose. But more providers are
>>>> exponentially
>>>> >> more difficult to maintain, so I’d settle for one utility provider
>>>> for now
>>>> >> and split further if needed in the future.
>>>> >>
>>>> >> Very much agree with this general statement. That's why I generally
>>>> do
>>>> >> not like the "util" approach because common packaging introduces
>>>> >> unnecessary coupling (you have to upgrade independent utils
>>>> together). And
>>>> >> when we have a common set of things that seem to make sense to be
>>>> released
>>>> >> together when upgraded we should package them together in
>>>> >> "common.<something concrete" (like we have with common.io and
>>>> common.sql).
>>>> >>
>>>> >> However - in this case, I think what Jens proposed (and I am happy
>>>> to try
>>>> >> as well) is to attempt to use symbolic links - i.e. add the code in
>>>> >> `airflow.util` but then create a symbolic link in the provider.  I
>>>> tested
>>>> >> it yesterday and it works as expected - i.e. such symbolic link is
>>>> >> dereferenced and the provider package contains the python file, not
>>>> >> symbolic link. That seems like a much more lightweight approach that
>>>> will
>>>> >> serve the purpose of "common.util" much better. The only thing we
>>>> will have
>>>> >> to take care of (and we can add it once the POC is successful) is to
>>>> add
>>>> >> some pre-commit protection that those kind of symbolically linked
>>>> util
>>>> >> modules are imported in providers, from inside of those provider,
>>>> not from
>>>> >> airlfow, and make sure they are "standalone" (i.e. - as you
>>>> mentioned - not
>>>> >> depend on anything in airflow code). We could create a new package
>>>> for that
>>>> >> in airlfow
>>>> >> "airlfow.provider_utils" for example - and make sure (as next step)
>>>> that
>>>> >> anything from that package is never directly imported by any
>>>> provider, and
>>>> >> whenever provider uses it, it should be symbolic link inside that
>>>> provider.
>>>> >> That's all automatable and we can prevent mistakes via pre-commit.
>>>> >>
>>>> >> I think that might lead to a very lightweight approach where we
>>>> introduce
>>>> >> new common functionality which is immediately reusable in providers
>>>> without
>>>> >> the hassle of taking care about backwards compatibility, and
>>>> managing the
>>>> >> "common.util" provider. At the expense of a bit complex pre-commit
>>>> that
>>>> >> will guard the usage of it.
>>>> >>
>>>> >> Seems that it might be the "Eat cake and have it too" way that we've
>>>> been
>>>> >> looking for.
>>>> >>
>>>> >> J.
>>>> >>
>>>> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung
>>>> <t...@astronomer.io.invalid>
>>>> >> wrote:
>>>> >>
>>>> >>> It would help in the sense mentioned in previous posts, yes. But one
>>>> >> thing
>>>> >>> I want to point out is, for the provider to actually be helpful, it
>>>> must
>>>> >> be
>>>> >>> treated a bit differently from normal providers, but more like a
>>>> separate
>>>> >>> third-party dependency. Specifically, the provider should not have a
>>>> >>> dependency to Core Airflow, so it can be released and depended on
>>>> more
>>>> >>> flexibly.
>>>> >>>
>>>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>>>> Storage,
>>>> >>> reading SQL, etc.) should each have its own utility package, so
>>>> they can
>>>> >> be
>>>> >>> released independently without dependency problems popping up if we
>>>> need
>>>> >> to
>>>> >>> break compatibility to one purpose. But more providers are
>>>> exponentially
>>>> >>> more difficult to maintain, so I’d settle for one utility provider
>>>> for
>>>> >> now
>>>> >>> and split further if needed in the future.
>>>> >>>
>>>> >>> TP
>>>> >>>
>>>> >>>
>>>> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
>>>> >>> jens.scheff...@de.bosch.com.INVALID> wrote:
>>>> >>>>
>>>> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse
>>>> and
>>>> >>> validate URIs for datasets?
>>>> >>>>
>>>> >>>> Mit freundlichen Grüßen / Best regards
>>>> >>>>
>>>> >>>> Jens Scheffler
>>>> >>>>
>>>> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
>>>> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565
>>>> Stuttgart-Vaihingen |
>>>> >>> GERMANY | www.bosch.com
>>>> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>>> >>> jens.scheff...@de.bosch.com
>>>> >>>>
>>>> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>>> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>>> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>>>> Markus
>>>> >>> Forschner,
>>>> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>>>> >>>>
>>>> >>>> -----Original Message-----
>>>> >>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>> >>>> Sent: Donnerstag, 22. Februar 2024 00:53
>>>> >>>> To: dev@airflow.apache.org
>>>> >>>> Subject: Re: [DISCUSS] Common.util provider?
>>>> >>>>
>>>> >>>> Yep. It could work with symbolic links. Tested it and with flit -
>>>> both
>>>> >>> wheel and sdist packaged code such symbolically linked file is
>>>> >> dereferenced
>>>> >>> and copy of the file is added there. It could be a nice way of
>>>> doing it.
>>>> >>>>
>>>> >>>> Maybe then worth trying next time if someone has a need?
>>>> >>>>
>>>> >>>> J
>>>> >>>>
>>>> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>>> >>> jens.scheff...@de.bosch.com.invalid> wrote:
>>>> >>>>
>>>> >>>>>>>> As of additional dependency complexity between providers
>>>> actually
>>>> >>>>>>>> the
>>>> >>>>> additional dependency I think creates more problems than the
>>>> benefit…
>>>> >>>>> would be cool if there would be an option to „inline“ common code
>>>> from
>>>> >>>>> a single place but keep individual providers fully independent…
>>>> >>>>>
>>>> >>>>>> Well, we already  do a lot of inlining, so if we think we should
>>>> do
>>>> >>>>>> more,
>>>> >>>>> we have mechanisms for that. We have  pre-commits and release
>>>> commands
>>>> >>>>> that do a lot of that. Pre commits are inlining scripts in
>>>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>>>> files
>>>> >>>>> and changelogs and index documentation .rst (partially) are
>>>> generated
>>>> >>>>> at release documentation preparation time, pyproject.toml for
>>>> >>>>> providers are generated from common templates at package building
>>>> time
>>>> >>>>> and so on and so on :). So we can do more of that and generate
>>>> common
>>>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>>>> But
>>>> >>>>> again "can't have and eat cake" - this has the drawback that
>>>> there are
>>>> >>>>> extra steps involved and even if it's automated it does add
>>>> friction
>>>> >>>>> when you have to regenerate the code every time you change it and
>>>> when
>>>> >>>>> you change it in another place than where you use it.
>>>> >>>>>
>>>> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we
>>>> really
>>>> >>>>> in-line and have a pre-commit aligning the redundancy of python in
>>>> >>> folders.
>>>> >>>>> Might need to be an opt-in if only 10 of 85 providers are using
>>>> common
>>>> >>>>> stuff and if we change a common line we probably do not need to
>>>> affect
>>>> >>>>> all providers.
>>>> >>>>>
>>>> >>>>> As long as no Windows users trying to code for airflow (do we
>>>> need to
>>>> >>>>> consider?) would it also work to use symlinks? Git can cope with
>>>> this,
>>>> >>>>> I don't know if the python toolchain can de-reference a copy and
>>>> are
>>>> >>>>> not packaging a symlink? Would be worth a test... would save the
>>>> >>>>> pre-commit and we even could selectively include common bla into
>>>> >>>>> providers as needed :-D
>>>> >>>>>
>>>> >>>>> Mit freundlichen Grüßen / Best regards
>>>> >>>>>
>>>> >>>>> Jens Scheffler
>>>> >>>>>
>>>> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH
>>>> |
>>>> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
>>>> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>>> >>>>> jens.scheff...@de.bosch.com
>>>> >>>>>
>>>> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB
>>>> 14000;
>>>> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>>> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>>>> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank
>>>> Meyer, Dr.
>>>> >>>>> Tanja Rückert
>>>> >>>>>
>>>> >>>>> -----Original Message-----
>>>> >>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18
>>>> >>>>> To: dev@airflow.apache.org
>>>> >>>>> Subject: Re: [DISCUSS] Common.util provider?
>>>> >>>>>
>>>> >>>>>> if we have a common piece then we are locking all depending
>>>> >>>>>> providers
>>>> >>>>> (potentially) together if common code changes
>>>> >>>>>
>>>> >>>>> Yes, coupling in this case is the drawback of this solution. You
>>>> can't
>>>> >>>>> have cake and eat it too and in this case you trade DRY with
>>>> coupling.
>>>> >>>>>
>>>> >>>>>> As of additional dependency complexity between providers actually
>>>> >>>>>> the
>>>> >>>>> additional dependency I think creates more problems than the
>>>> benefit…
>>>> >>>>> would be cool if there would be an option to „inline“ common code
>>>> from
>>>> >>>>> a single place but keep individual providers fully independent…
>>>> >>>>>
>>>> >>>>> Well, we already  do a lot of inlining, so if we think we should
>>>> do
>>>> >>>>> more, we have mechanisms for that. We have  pre-commits and
>>>> release
>>>> >>>>> commands that do a lot of that. Pre commits are inlining scripts
>>>> in
>>>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>>>> files
>>>> >>>>> and changelogs and index documentation .rst (partially) are
>>>> generated
>>>> >>>>> at release documentation preparation time, pyproject.toml for
>>>> >>>>> providers are generated from common templates at package building
>>>> time
>>>> >>>>> and so on and so on :). So we can do more of that and generate
>>>> common
>>>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>>>> But
>>>> >>>>> again "can't have and eat cake" - this has the drawback that
>>>> there are
>>>> >>>>> extra steps involved and even if it's automated it does add
>>>> friction
>>>> >>>>> when you have to regenerate the code every time you change it and
>>>> when
>>>> >>>>> you change it in another place than where you use it.
>>>> >>>>>
>>>> >>>>> J.
>>>> >>>>>
>>>> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>>> >>>>> jens.scheff...@de.bosch.com.invalid> wrote:
>>>> >>>>>
>>>> >>>>>> Hi Jarek,
>>>> >>>>>>
>>>> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the
>>>> feeling
>>>> >>>>>> that a lot of very similar code is repeated in all the providers.
>>>> >>>>>> But during review yesterday I dropped the ides because if we
>>>> have a
>>>> >>>>>> common piece then we are locking all depending providers
>>>> >>>>>> (potentially) together if common code changes.
>>>> >>>>>> As of additional dependency complexity between providers actually
>>>> >>>>>> the additional dependency I think creates more prblems than the
>>>> >>>>>> benefit… would be cool if tehere would be an option to „inline“
>>>> >>>>>> common code from a single place but keep individual providers
>>>> fully
>>>> >>>>>> independent…
>>>> >>>>>>
>>>> >>>>>> Jens
>>>> >>>>>>
>>>> >>>>>> Sent from Outlook for
>>>> >>>>>> iOS<
>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>>>> >>>>>> 2F
>>>> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com
>>>> %7C98c88
>>>> >>>>>> 97
>>>> >>>>>>
>>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
>>>> >>>>>> %7
>>>> >>>>>>
>>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
>>>> >>>>>> Ij
>>>> >>>>>>
>>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
>>>> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
>>>> >>>>>> ________________________________
>>>> >>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
>>>> >>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org>
>>>> >>>>>> Subject: [DISCUSS] Common.util provider?
>>>> >>>>>>
>>>> >>>>>> Hello everyone,
>>>> >>>>>>
>>>> >>>>>> How do we feel about introducing a common.util provider?
>>>> >>>>>>
>>>> >>>>>> I know it's not been the original idea behind providers, but -
>>>> after
>>>> >>>>>> testing common.sql and now also having common.io, seems like
>>>> more
>>>> >>>>>> and more we would like to extract some common code that we would
>>>> >>>>>> like providers to use, but we refrain from it, because it will
>>>> only
>>>> >>>>>> be actually usable 6 months after we introduce some common code.
>>>> >>>>>>
>>>> >>>>>> However, if we introduce common.util, this problem is generally
>>>> gone
>>>> >>>>>> - at the expense of more complex maintenance and cross-provider
>>>> >>>>> dependencies.
>>>> >>>>>> We should be able to add a common util method and use it in a
>>>> >>>>>> provider at the same time.
>>>> >>>>>>
>>>> >>>>>> Think Amazon provider using a new feature released in common.util
>>>> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
>>>> >>>>>> already for common.sql. We know how to do it, we know what to
>>>> avoid,
>>>> >>>>>> we know we cannot introduce backwards-incompatible changes, so we
>>>> >>>>>> have to be very clear what is and what is not a public API
>>>> there, We
>>>> >>>>>> could rather easily add tests to prevent such
>>>> backwards-incompatible
>>>> >>>>>> changes. We even have a solution for chicken-egg providers where
>>>> we
>>>> >>>>>> need to release two providers at the same time if they depend on
>>>> >>>>>> each other. Generally speaking it's quite workable but adds a
>>>> bit of
>>>> >>> overhead.
>>>> >>>>>>
>>>> >>>>>> Examples that we could implement as "common.util":
>>>> >>>>>>
>>>> >>>>>> - common versioning check with cache - where multiple providers
>>>> >>>>>> could reuse "do we have pendulum 2"
>>>> >>>>>> - more complex - some date management features (we have a few
>>>> like
>>>> >>>>>> date_ranges/round_time). But there are many more.
>>>> >>>>>>
>>>> >>>>>> I generally do not love the common "util" approach. It has a
>>>> >>>>>> tendency to become a bag of everything over time. but if we
>>>> limit it
>>>> >>>>>> to a set of small, fully decoupled modules where each module is
>>>> >>>>>> independent - it's OK. And we already have it in "airflow.util"
>>>> and
>>>> >>>>>> we seem to be
>>>> >>>>> doing well.
>>>> >>>>>>
>>>> >>>>>> WDYT? Is it worth it ?
>>>> >>>>>>
>>>> >>>>>> J.
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> ---------------------------------------------------------------------
>>>> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>> >>>
>>>> >>>
>>>> >>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>
>>>>

Reply via email to