I have an idea about that one, and probably that one will fulfill the "polyfill" approach discussed earlier.
I think we should not name the provider "common.util" but "common.compat" - because all the code that we need to put there is really about keeping compatibility. For example look here https://github.com/apache/airflow/pull/39530 We have a need to have a "compatibility" code somewhere that a number of providers could use in case we want to keep some backwards compatibility. So having a "common.compat" provider would likely nicely full-fill the polyfill approach - It should only contain the code that we aim to keep backwards compatibility Example for https://github.com/apache/airflow/pull/39530 * we add the complex compatibility code (see https://github.com/apache/airflow/pull/39530#issuecomment-2145670785) in the "common.compat" provider - and to airflow.openlineage in this case * we import it from there in all providers that need it (this will automatically add dependency) * when providers get >= airflow 2.10 - we change them to import from `airflow.openlineage` rather than from "airflow.providers.common.compat". We could apply similar approach for other "compatibility" code J. On Thu, Apr 11, 2024 at 10:22 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Any other ideas or suggestions here? Can someone explain how the > "polypill" approach would look like, maybe? How do we imagine this working? > > Just to continue this discussion - another example. > > Small thing that David wanted to add for changes in some sql providers: > > @contextmanager > def suppress_and_warn(*exceptions: type[BaseException]): > """Context manager that suppresses the given exceptions and logs a > warning message.""" > try: > yield > except exceptions as e: > warnings.warn(f"Exception suppressed: > {e}\n{traceback.format_exc()}", category=UserWarning) > > > > https://github.com/apache/airflow/pull/38707/files#diff-6e1b2f961cb951d05d66d2d814ef5f6d8f8bf8f43c40fb5d40e27a031fed8dd7R115 > > This is a small thing - but adding it in `airflow` is problematic - > because it will only be released in 1.10, so we cannot use it in providers > if we do. > Currently - since it is used in sql providers, I suggested using > `common.sql` for that code (and add >= 1.12 for common-sql-providers for > those providers that use it). And I will write a separate email about a > proposed versioning approach there. > > Do we have a good proposal on how we can solve similar things in the > future? > Do we want it at all? It has some challenges - yes it DRY's the code but > it also introduces coupling. > > J. > > > On Sun, Mar 10, 2024 at 6:21 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Coming back to it - what about the "polypill" :)? What's different vs the >> "common.sql" way of doing it ? How do we think it can work ? >> >> On Thu, Feb 22, 2024 at 1:58 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> > The symbolic link approach seems to disregard all the external >>> providers, unless I misunderstand it. >>> >>> Not really. It just does not make it easy for the external providers to >>> use it "fast". They can still - if they want to just manually copy those >>> utils from the latest version of Airflow if they want to use it. Almost by >>> definition, those will be small, independent modules that can be simply >>> copied as needed by whoever releases external providers - and they are also >>> free to copy any older version if they want. That is a nice feature that >>> makes them fully decoupled from the version of Airflow they are installed >>> in (same as community providers). Or - if they want they can just import >>> them from "airflow.provider_utils" - but then they have to add >= Airflow >>> 2.9 if that util appeared in Airflow 2.9 (which is the main reason we want >>> to use symbolic links - because due to our policies and promises, we do not >>> want community providers to depend on latest version of Airflow in vast >>> majority of cases. >>> >>> So this approach is also fully usable by external providers, but it >>> requires some manual effort to copy the modules to their providers. >>> >>> > I like the polypill idea. A backport provider that brings new >>> interfaces to providers without the actual functionalities. >>> >>> I would love to hear more about this, I think the "common.util" was >>> exactly the kind of polyfill approach (with its own versioning >>> complexities) but maybe I do not understand how such a polyfill provider >>> would work. Say we want to add a new "urlparse" method usable for all >>> providers. Could you explain how it would work - say: >>> >>> * we add "urlparse" in Airflow 2.9 >>> * some provider wants to use it in Airflow 2.7 >>> >>> What providers, with what code/interfaces we would have to release in >>> this case and what dependencies such providers that want to use it (both >>> community and Airflow should have)? I **think** that would mean exactly the >>> "common.<something>" approach we already have with "io" and "sql", but >>> maybe I do not understand it :) >>> >>> On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <t...@astronomer.io.invalid> >>> wrote: >>> >>>> I like the polypill idea. A backport provider that brings new >>>> interfaces to providers without the actual functionalities. >>>> >>>> >>>> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mobuchow...@apache.org> >>>> wrote: >>>> > >>>> >> That's why I generally do >>>> > not like the "util" approach because common packaging introduces >>>> > unnecessary coupling (you have to upgrade independent utils together). >>>> > >>>> > From my experience with releasing OpenLineage where we do things >>>> similarly: >>>> > I think that's the advantage of it, but only _if_ you can release >>>> those >>>> > together. >>>> > With "build-in" providers it makes sense, but could be burdensome if >>>> > "external" >>>> > ones would depend on that functionality. >>>> > >>>> >> I know it's not been the original idea behind providers, but - after >>>> > testing common.sql and now also having common.io, seems like more >>>> and more >>>> > we would like to extract some common code that we would like >>>> providers to >>>> > use, but we refrain from it, because it will only be actually usable 6 >>>> > months after we introduce some common code. >>>> > >>>> > So, maybe better approach would be to introduce the functionality into >>>> > core, >>>> > and use common.X provider as "polyfill" (to borrow JS nomenclature) >>>> > to make sure providers could use that functionality now, where >>>> external >>>> > ones could depend on the Airflow ones? >>>> > >>>> > The symbolic link approach seems to disregard all the external >>>> providers, >>>> > unless >>>> > I misunderstand it. >>>> > >>>> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a): >>>> > >>>> >>> Ideally utilities for each purpose (parsing URI, reading Object >>>> Storage, >>>> >> reading SQL, etc.) should each have its own utility package, so they >>>> can be >>>> >> released independently without dependency problems popping up if we >>>> need to >>>> >> break compatibility to one purpose. But more providers are >>>> exponentially >>>> >> more difficult to maintain, so I’d settle for one utility provider >>>> for now >>>> >> and split further if needed in the future. >>>> >> >>>> >> Very much agree with this general statement. That's why I generally >>>> do >>>> >> not like the "util" approach because common packaging introduces >>>> >> unnecessary coupling (you have to upgrade independent utils >>>> together). And >>>> >> when we have a common set of things that seem to make sense to be >>>> released >>>> >> together when upgraded we should package them together in >>>> >> "common.<something concrete" (like we have with common.io and >>>> common.sql). >>>> >> >>>> >> However - in this case, I think what Jens proposed (and I am happy >>>> to try >>>> >> as well) is to attempt to use symbolic links - i.e. add the code in >>>> >> `airflow.util` but then create a symbolic link in the provider. I >>>> tested >>>> >> it yesterday and it works as expected - i.e. such symbolic link is >>>> >> dereferenced and the provider package contains the python file, not >>>> >> symbolic link. That seems like a much more lightweight approach that >>>> will >>>> >> serve the purpose of "common.util" much better. The only thing we >>>> will have >>>> >> to take care of (and we can add it once the POC is successful) is to >>>> add >>>> >> some pre-commit protection that those kind of symbolically linked >>>> util >>>> >> modules are imported in providers, from inside of those provider, >>>> not from >>>> >> airlfow, and make sure they are "standalone" (i.e. - as you >>>> mentioned - not >>>> >> depend on anything in airflow code). We could create a new package >>>> for that >>>> >> in airlfow >>>> >> "airlfow.provider_utils" for example - and make sure (as next step) >>>> that >>>> >> anything from that package is never directly imported by any >>>> provider, and >>>> >> whenever provider uses it, it should be symbolic link inside that >>>> provider. >>>> >> That's all automatable and we can prevent mistakes via pre-commit. >>>> >> >>>> >> I think that might lead to a very lightweight approach where we >>>> introduce >>>> >> new common functionality which is immediately reusable in providers >>>> without >>>> >> the hassle of taking care about backwards compatibility, and >>>> managing the >>>> >> "common.util" provider. At the expense of a bit complex pre-commit >>>> that >>>> >> will guard the usage of it. >>>> >> >>>> >> Seems that it might be the "Eat cake and have it too" way that we've >>>> been >>>> >> looking for. >>>> >> >>>> >> J. >>>> >> >>>> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung >>>> <t...@astronomer.io.invalid> >>>> >> wrote: >>>> >> >>>> >>> It would help in the sense mentioned in previous posts, yes. But one >>>> >> thing >>>> >>> I want to point out is, for the provider to actually be helpful, it >>>> must >>>> >> be >>>> >>> treated a bit differently from normal providers, but more like a >>>> separate >>>> >>> third-party dependency. Specifically, the provider should not have a >>>> >>> dependency to Core Airflow, so it can be released and depended on >>>> more >>>> >>> flexibly. >>>> >>> >>>> >>> Ideally utilities for each purpose (parsing URI, reading Object >>>> Storage, >>>> >>> reading SQL, etc.) should each have its own utility package, so >>>> they can >>>> >> be >>>> >>> released independently without dependency problems popping up if we >>>> need >>>> >> to >>>> >>> break compatibility to one purpose. But more providers are >>>> exponentially >>>> >>> more difficult to maintain, so I’d settle for one utility provider >>>> for >>>> >> now >>>> >>> and split further if needed in the future. >>>> >>> >>>> >>> TP >>>> >>> >>>> >>> >>>> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) < >>>> >>> jens.scheff...@de.bosch.com.INVALID> wrote: >>>> >>>> >>>> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse >>>> and >>>> >>> validate URIs for datasets? >>>> >>>> >>>> >>>> Mit freundlichen Grüßen / Best regards >>>> >>>> >>>> >>>> Jens Scheffler >>>> >>>> >>>> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) >>>> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 >>>> Stuttgart-Vaihingen | >>>> >>> GERMANY | www.bosch.com >>>> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 | >>>> >>> jens.scheff...@de.bosch.com >>>> >>>> >>>> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; >>>> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; >>>> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. >>>> Markus >>>> >>> Forschner, >>>> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert >>>> >>>> >>>> >>>> -----Original Message----- >>>> >>>> From: Jarek Potiuk <ja...@potiuk.com> >>>> >>>> Sent: Donnerstag, 22. Februar 2024 00:53 >>>> >>>> To: dev@airflow.apache.org >>>> >>>> Subject: Re: [DISCUSS] Common.util provider? >>>> >>>> >>>> >>>> Yep. It could work with symbolic links. Tested it and with flit - >>>> both >>>> >>> wheel and sdist packaged code such symbolically linked file is >>>> >> dereferenced >>>> >>> and copy of the file is added there. It could be a nice way of >>>> doing it. >>>> >>>> >>>> >>>> Maybe then worth trying next time if someone has a need? >>>> >>>> >>>> >>>> J >>>> >>>> >>>> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) < >>>> >>> jens.scheff...@de.bosch.com.invalid> wrote: >>>> >>>> >>>> >>>>>>>> As of additional dependency complexity between providers >>>> actually >>>> >>>>>>>> the >>>> >>>>> additional dependency I think creates more problems than the >>>> benefit… >>>> >>>>> would be cool if there would be an option to „inline“ common code >>>> from >>>> >>>>> a single place but keep individual providers fully independent… >>>> >>>>> >>>> >>>>>> Well, we already do a lot of inlining, so if we think we should >>>> do >>>> >>>>>> more, >>>> >>>>> we have mechanisms for that. We have pre-commits and release >>>> commands >>>> >>>>> that do a lot of that. Pre commits are inlining scripts in >>>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py >>>> files >>>> >>>>> and changelogs and index documentation .rst (partially) are >>>> generated >>>> >>>>> at release documentation preparation time, pyproject.toml for >>>> >>>>> providers are generated from common templates at package building >>>> time >>>> >>>>> and so on and so on :). So we can do more of that and generate >>>> common >>>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts. >>>> But >>>> >>>>> again "can't have and eat cake" - this has the drawback that >>>> there are >>>> >>>>> extra steps involved and even if it's automated it does add >>>> friction >>>> >>>>> when you have to regenerate the code every time you change it and >>>> when >>>> >>>>> you change it in another place than where you use it. >>>> >>>>> >>>> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we >>>> really >>>> >>>>> in-line and have a pre-commit aligning the redundancy of python in >>>> >>> folders. >>>> >>>>> Might need to be an opt-in if only 10 of 85 providers are using >>>> common >>>> >>>>> stuff and if we change a common line we probably do not need to >>>> affect >>>> >>>>> all providers. >>>> >>>>> >>>> >>>>> As long as no Windows users trying to code for airflow (do we >>>> need to >>>> >>>>> consider?) would it also work to use symlinks? Git can cope with >>>> this, >>>> >>>>> I don't know if the python toolchain can de-reference a copy and >>>> are >>>> >>>>> not packaging a symlink? Would be worth a test... would save the >>>> >>>>> pre-commit and we even could selectively include common bla into >>>> >>>>> providers as needed :-D >>>> >>>>> >>>> >>>>> Mit freundlichen Grüßen / Best regards >>>> >>>>> >>>> >>>>> Jens Scheffler >>>> >>>>> >>>> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH >>>> | >>>> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | >>>> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 | >>>> >>>>> jens.scheff...@de.bosch.com >>>> >>>>> >>>> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB >>>> 14000; >>>> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; >>>> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. >>>> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank >>>> Meyer, Dr. >>>> >>>>> Tanja Rückert >>>> >>>>> >>>> >>>>> -----Original Message----- >>>> >>>>> From: Jarek Potiuk <ja...@potiuk.com> >>>> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18 >>>> >>>>> To: dev@airflow.apache.org >>>> >>>>> Subject: Re: [DISCUSS] Common.util provider? >>>> >>>>> >>>> >>>>>> if we have a common piece then we are locking all depending >>>> >>>>>> providers >>>> >>>>> (potentially) together if common code changes >>>> >>>>> >>>> >>>>> Yes, coupling in this case is the drawback of this solution. You >>>> can't >>>> >>>>> have cake and eat it too and in this case you trade DRY with >>>> coupling. >>>> >>>>> >>>> >>>>>> As of additional dependency complexity between providers actually >>>> >>>>>> the >>>> >>>>> additional dependency I think creates more problems than the >>>> benefit… >>>> >>>>> would be cool if there would be an option to „inline“ common code >>>> from >>>> >>>>> a single place but keep individual providers fully independent… >>>> >>>>> >>>> >>>>> Well, we already do a lot of inlining, so if we think we should >>>> do >>>> >>>>> more, we have mechanisms for that. We have pre-commits and >>>> release >>>> >>>>> commands that do a lot of that. Pre commits are inlining scripts >>>> in >>>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py >>>> files >>>> >>>>> and changelogs and index documentation .rst (partially) are >>>> generated >>>> >>>>> at release documentation preparation time, pyproject.toml for >>>> >>>>> providers are generated from common templates at package building >>>> time >>>> >>>>> and so on and so on :). So we can do more of that and generate >>>> common >>>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts. >>>> But >>>> >>>>> again "can't have and eat cake" - this has the drawback that >>>> there are >>>> >>>>> extra steps involved and even if it's automated it does add >>>> friction >>>> >>>>> when you have to regenerate the code every time you change it and >>>> when >>>> >>>>> you change it in another place than where you use it. >>>> >>>>> >>>> >>>>> J. >>>> >>>>> >>>> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) < >>>> >>>>> jens.scheff...@de.bosch.com.invalid> wrote: >>>> >>>>> >>>> >>>>>> Hi Jarek, >>>> >>>>>> >>>> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the >>>> feeling >>>> >>>>>> that a lot of very similar code is repeated in all the providers. >>>> >>>>>> But during review yesterday I dropped the ides because if we >>>> have a >>>> >>>>>> common piece then we are locking all depending providers >>>> >>>>>> (potentially) together if common code changes. >>>> >>>>>> As of additional dependency complexity between providers actually >>>> >>>>>> the additional dependency I think creates more prblems than the >>>> >>>>>> benefit… would be cool if tehere would be an option to „inline“ >>>> >>>>>> common code from a single place but keep individual providers >>>> fully >>>> >>>>>> independent… >>>> >>>>>> >>>> >>>>>> Jens >>>> >>>>>> >>>> >>>>>> Sent from Outlook for >>>> >>>>>> iOS< >>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F% >>>> >>>>>> 2F >>>> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com >>>> %7C98c88 >>>> >>>>>> 97 >>>> >>>>>> >>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0 >>>> >>>>>> %7 >>>> >>>>>> >>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ >>>> >>>>>> Ij >>>> >>>>>> >>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn >>>> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0> >>>> >>>>>> ________________________________ >>>> >>>>>> From: Jarek Potiuk <ja...@potiuk.com> >>>> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM >>>> >>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org> >>>> >>>>>> Subject: [DISCUSS] Common.util provider? >>>> >>>>>> >>>> >>>>>> Hello everyone, >>>> >>>>>> >>>> >>>>>> How do we feel about introducing a common.util provider? >>>> >>>>>> >>>> >>>>>> I know it's not been the original idea behind providers, but - >>>> after >>>> >>>>>> testing common.sql and now also having common.io, seems like >>>> more >>>> >>>>>> and more we would like to extract some common code that we would >>>> >>>>>> like providers to use, but we refrain from it, because it will >>>> only >>>> >>>>>> be actually usable 6 months after we introduce some common code. >>>> >>>>>> >>>> >>>>>> However, if we introduce common.util, this problem is generally >>>> gone >>>> >>>>>> - at the expense of more complex maintenance and cross-provider >>>> >>>>> dependencies. >>>> >>>>>> We should be able to add a common util method and use it in a >>>> >>>>>> provider at the same time. >>>> >>>>>> >>>> >>>>>> Think Amazon provider using a new feature released in common.util >>>> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it >>>> >>>>>> already for common.sql. We know how to do it, we know what to >>>> avoid, >>>> >>>>>> we know we cannot introduce backwards-incompatible changes, so we >>>> >>>>>> have to be very clear what is and what is not a public API >>>> there, We >>>> >>>>>> could rather easily add tests to prevent such >>>> backwards-incompatible >>>> >>>>>> changes. We even have a solution for chicken-egg providers where >>>> we >>>> >>>>>> need to release two providers at the same time if they depend on >>>> >>>>>> each other. Generally speaking it's quite workable but adds a >>>> bit of >>>> >>> overhead. >>>> >>>>>> >>>> >>>>>> Examples that we could implement as "common.util": >>>> >>>>>> >>>> >>>>>> - common versioning check with cache - where multiple providers >>>> >>>>>> could reuse "do we have pendulum 2" >>>> >>>>>> - more complex - some date management features (we have a few >>>> like >>>> >>>>>> date_ranges/round_time). But there are many more. >>>> >>>>>> >>>> >>>>>> I generally do not love the common "util" approach. It has a >>>> >>>>>> tendency to become a bag of everything over time. but if we >>>> limit it >>>> >>>>>> to a set of small, fully decoupled modules where each module is >>>> >>>>>> independent - it's OK. And we already have it in "airflow.util" >>>> and >>>> >>>>>> we seem to be >>>> >>>>> doing well. >>>> >>>>>> >>>> >>>>>> WDYT? Is it worth it ? >>>> >>>>>> >>>> >>>>>> J. >>>> >>>>>> >>>> >>>>> >>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>> >>> >>>> >>> >>>> >> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>> >>>>