Coming back to it - what about the "polypill" :)? What's different vs the
"common.sql" way of doing it ? How do we think it can work ?

On Thu, Feb 22, 2024 at 1:58 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> > The symbolic link approach seems to disregard all the external
> providers, unless I misunderstand it.
>
> Not really. It just does not make it easy for the external providers to
> use it "fast".  They can still - if they want to just manually copy those
> utils from the latest version of Airflow if they want to use it. Almost by
> definition, those will be small, independent modules that can be simply
> copied as needed by whoever releases external providers - and they are also
> free to copy any older version if they want. That is a nice feature that
> makes them fully decoupled from the version of Airflow they are installed
> in (same as community providers). Or - if they want they can just import
> them from "airflow.provider_utils" - but then they have to add >= Airflow
> 2.9 if that util appeared in Airflow 2.9 (which is the main reason we want
> to use symbolic links - because due to our policies and promises, we do not
> want community providers to depend on latest version of Airflow in vast
> majority of cases.
>
> So this approach is also fully usable by external providers, but it
> requires some manual effort to copy the modules to their providers.
>
> > I like the polypill idea. A backport provider that brings new interfaces
> to providers without the actual functionalities.
>
> I would love to hear more about this, I think the "common.util" was
> exactly the kind of polyfill approach (with its own versioning
> complexities) but maybe I do not understand how such a polyfill provider
> would work. Say we want to add a new "urlparse" method usable for all
> providers. Could you explain how it would work - say:
>
> * we add "urlparse" in Airflow 2.9
> * some provider wants to use it in Airflow 2.7
>
> What providers, with what code/interfaces we would have to release in this
> case and what dependencies such providers that want to use it (both
> community and Airflow should have)? I **think** that would mean exactly the
> "common.<something>" approach we already have with "io" and "sql", but
> maybe I do not understand it :)
>
> On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <t...@astronomer.io.invalid>
> wrote:
>
>> I like the polypill idea. A backport provider that brings new interfaces
>> to providers without the actual functionalities.
>>
>>
>> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mobuchow...@apache.org>
>> wrote:
>> >
>> >> That's why I generally do
>> > not like the "util" approach because common packaging introduces
>> > unnecessary coupling (you have to upgrade independent utils together).
>> >
>> > From my experience with releasing OpenLineage where we do things
>> similarly:
>> > I think that's the advantage of it, but only _if_ you can release those
>> > together.
>> > With "build-in" providers it makes sense, but could be burdensome if
>> > "external"
>> > ones would depend on that functionality.
>> >
>> >> I know it's not been the original idea behind providers, but - after
>> > testing common.sql and now also having common.io, seems like more and
>> more
>> > we would like to extract some common code that we would like providers
>> to
>> > use, but we refrain from it, because it will only be actually usable 6
>> > months after we introduce some common code.
>> >
>> > So, maybe better approach would be to introduce the functionality into
>> > core,
>> > and use common.X provider as "polyfill" (to borrow JS nomenclature)
>> > to make sure providers could use that functionality now, where external
>> > ones could depend on the Airflow ones?
>> >
>> > The symbolic link approach seems to disregard all the external
>> providers,
>> > unless
>> > I misunderstand it.
>> >
>> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>> >
>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>> Storage,
>> >> reading SQL, etc.) should each have its own utility package, so they
>> can be
>> >> released independently without dependency problems popping up if we
>> need to
>> >> break compatibility to one purpose. But more providers are
>> exponentially
>> >> more difficult to maintain, so I’d settle for one utility provider for
>> now
>> >> and split further if needed in the future.
>> >>
>> >> Very much agree with this general statement. That's why I generally do
>> >> not like the "util" approach because common packaging introduces
>> >> unnecessary coupling (you have to upgrade independent utils together).
>> And
>> >> when we have a common set of things that seem to make sense to be
>> released
>> >> together when upgraded we should package them together in
>> >> "common.<something concrete" (like we have with common.io and
>> common.sql).
>> >>
>> >> However - in this case, I think what Jens proposed (and I am happy to
>> try
>> >> as well) is to attempt to use symbolic links - i.e. add the code in
>> >> `airflow.util` but then create a symbolic link in the provider.  I
>> tested
>> >> it yesterday and it works as expected - i.e. such symbolic link is
>> >> dereferenced and the provider package contains the python file, not
>> >> symbolic link. That seems like a much more lightweight approach that
>> will
>> >> serve the purpose of "common.util" much better. The only thing we will
>> have
>> >> to take care of (and we can add it once the POC is successful) is to
>> add
>> >> some pre-commit protection that those kind of symbolically linked util
>> >> modules are imported in providers, from inside of those provider, not
>> from
>> >> airlfow, and make sure they are "standalone" (i.e. - as you mentioned
>> - not
>> >> depend on anything in airflow code). We could create a new package for
>> that
>> >> in airlfow
>> >> "airlfow.provider_utils" for example - and make sure (as next step)
>> that
>> >> anything from that package is never directly imported by any provider,
>> and
>> >> whenever provider uses it, it should be symbolic link inside that
>> provider.
>> >> That's all automatable and we can prevent mistakes via pre-commit.
>> >>
>> >> I think that might lead to a very lightweight approach where we
>> introduce
>> >> new common functionality which is immediately reusable in providers
>> without
>> >> the hassle of taking care about backwards compatibility, and managing
>> the
>> >> "common.util" provider. At the expense of a bit complex pre-commit that
>> >> will guard the usage of it.
>> >>
>> >> Seems that it might be the "Eat cake and have it too" way that we've
>> been
>> >> looking for.
>> >>
>> >> J.
>> >>
>> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung
>> <t...@astronomer.io.invalid>
>> >> wrote:
>> >>
>> >>> It would help in the sense mentioned in previous posts, yes. But one
>> >> thing
>> >>> I want to point out is, for the provider to actually be helpful, it
>> must
>> >> be
>> >>> treated a bit differently from normal providers, but more like a
>> separate
>> >>> third-party dependency. Specifically, the provider should not have a
>> >>> dependency to Core Airflow, so it can be released and depended on more
>> >>> flexibly.
>> >>>
>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>> Storage,
>> >>> reading SQL, etc.) should each have its own utility package, so they
>> can
>> >> be
>> >>> released independently without dependency problems popping up if we
>> need
>> >> to
>> >>> break compatibility to one purpose. But more providers are
>> exponentially
>> >>> more difficult to maintain, so I’d settle for one utility provider for
>> >> now
>> >>> and split further if needed in the future.
>> >>>
>> >>> TP
>> >>>
>> >>>
>> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
>> >>> jens.scheff...@de.bosch.com.INVALID> wrote:
>> >>>>
>> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse and
>> >>> validate URIs for datasets?
>> >>>>
>> >>>> Mit freundlichen Grüßen / Best regards
>> >>>>
>> >>>> Jens Scheffler
>> >>>>
>> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
>> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
>> >>> GERMANY | www.bosch.com
>> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>> >>> jens.scheff...@de.bosch.com
>> >>>>
>> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>> Markus
>> >>> Forschner,
>> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: Jarek Potiuk <ja...@potiuk.com>
>> >>>> Sent: Donnerstag, 22. Februar 2024 00:53
>> >>>> To: dev@airflow.apache.org
>> >>>> Subject: Re: [DISCUSS] Common.util provider?
>> >>>>
>> >>>> Yep. It could work with symbolic links. Tested it and with flit -
>> both
>> >>> wheel and sdist packaged code such symbolically linked file is
>> >> dereferenced
>> >>> and copy of the file is added there. It could be a nice way of doing
>> it.
>> >>>>
>> >>>> Maybe then worth trying next time if someone has a need?
>> >>>>
>> >>>> J
>> >>>>
>> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
>> >>> jens.scheff...@de.bosch.com.invalid> wrote:
>> >>>>
>> >>>>>>>> As of additional dependency complexity between providers actually
>> >>>>>>>> the
>> >>>>> additional dependency I think creates more problems than the
>> benefit…
>> >>>>> would be cool if there would be an option to „inline“ common code
>> from
>> >>>>> a single place but keep individual providers fully independent…
>> >>>>>
>> >>>>>> Well, we already  do a lot of inlining, so if we think we should do
>> >>>>>> more,
>> >>>>> we have mechanisms for that. We have  pre-commits and release
>> commands
>> >>>>> that do a lot of that. Pre commits are inlining scripts in
>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>> files
>> >>>>> and changelogs and index documentation .rst (partially) are
>> generated
>> >>>>> at release documentation preparation time, pyproject.toml for
>> >>>>> providers are generated from common templates at package building
>> time
>> >>>>> and so on and so on :). So we can do more of that and generate
>> common
>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>> But
>> >>>>> again "can't have and eat cake" - this has the drawback that there
>> are
>> >>>>> extra steps involved and even if it's automated it does add friction
>> >>>>> when you have to regenerate the code every time you change it and
>> when
>> >>>>> you change it in another place than where you use it.
>> >>>>>
>> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we
>> really
>> >>>>> in-line and have a pre-commit aligning the redundancy of python in
>> >>> folders.
>> >>>>> Might need to be an opt-in if only 10 of 85 providers are using
>> common
>> >>>>> stuff and if we change a common line we probably do not need to
>> affect
>> >>>>> all providers.
>> >>>>>
>> >>>>> As long as no Windows users trying to code for airflow (do we need
>> to
>> >>>>> consider?) would it also work to use symlinks? Git can cope with
>> this,
>> >>>>> I don't know if the python toolchain can de-reference a copy and are
>> >>>>> not packaging a symlink? Would be worth a test... would save the
>> >>>>> pre-commit and we even could selectively include common bla into
>> >>>>> providers as needed :-D
>> >>>>>
>> >>>>> Mit freundlichen Grüßen / Best regards
>> >>>>>
>> >>>>> Jens Scheffler
>> >>>>>
>> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
>> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
>> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>> >>>>> jens.scheff...@de.bosch.com
>> >>>>>
>> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer,
>> Dr.
>> >>>>> Tanja Rückert
>> >>>>>
>> >>>>> -----Original Message-----
>> >>>>> From: Jarek Potiuk <ja...@potiuk.com>
>> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18
>> >>>>> To: dev@airflow.apache.org
>> >>>>> Subject: Re: [DISCUSS] Common.util provider?
>> >>>>>
>> >>>>>> if we have a common piece then we are locking all depending
>> >>>>>> providers
>> >>>>> (potentially) together if common code changes
>> >>>>>
>> >>>>> Yes, coupling in this case is the drawback of this solution. You
>> can't
>> >>>>> have cake and eat it too and in this case you trade DRY with
>> coupling.
>> >>>>>
>> >>>>>> As of additional dependency complexity between providers actually
>> >>>>>> the
>> >>>>> additional dependency I think creates more problems than the
>> benefit…
>> >>>>> would be cool if there would be an option to „inline“ common code
>> from
>> >>>>> a single place but keep individual providers fully independent…
>> >>>>>
>> >>>>> Well, we already  do a lot of inlining, so if we think we should do
>> >>>>> more, we have mechanisms for that. We have  pre-commits and release
>> >>>>> commands that do a lot of that. Pre commits are inlining scripts in
>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>> files
>> >>>>> and changelogs and index documentation .rst (partially) are
>> generated
>> >>>>> at release documentation preparation time, pyproject.toml for
>> >>>>> providers are generated from common templates at package building
>> time
>> >>>>> and so on and so on :). So we can do more of that and generate
>> common
>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>> But
>> >>>>> again "can't have and eat cake" - this has the drawback that there
>> are
>> >>>>> extra steps involved and even if it's automated it does add friction
>> >>>>> when you have to regenerate the code every time you change it and
>> when
>> >>>>> you change it in another place than where you use it.
>> >>>>>
>> >>>>> J.
>> >>>>>
>> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
>> >>>>> jens.scheff...@de.bosch.com.invalid> wrote:
>> >>>>>
>> >>>>>> Hi Jarek,
>> >>>>>>
>> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling
>> >>>>>> that a lot of very similar code is repeated in all the providers.
>> >>>>>> But during review yesterday I dropped the ides because if we have a
>> >>>>>> common piece then we are locking all depending providers
>> >>>>>> (potentially) together if common code changes.
>> >>>>>> As of additional dependency complexity between providers actually
>> >>>>>> the additional dependency I think creates more prblems than the
>> >>>>>> benefit… would be cool if tehere would be an option to „inline“
>> >>>>>> common code from a single place but keep individual providers fully
>> >>>>>> independent…
>> >>>>>>
>> >>>>>> Jens
>> >>>>>>
>> >>>>>> Sent from Outlook for
>> >>>>>> iOS<
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>> >>>>>> 2F
>> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com
>> %7C98c88
>> >>>>>> 97
>> >>>>>>
>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
>> >>>>>> %7
>> >>>>>>
>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
>> >>>>>> Ij
>> >>>>>>
>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
>> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
>> >>>>>> ________________________________
>> >>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
>> >>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org>
>> >>>>>> Subject: [DISCUSS] Common.util provider?
>> >>>>>>
>> >>>>>> Hello everyone,
>> >>>>>>
>> >>>>>> How do we feel about introducing a common.util provider?
>> >>>>>>
>> >>>>>> I know it's not been the original idea behind providers, but -
>> after
>> >>>>>> testing common.sql and now also having common.io, seems like more
>> >>>>>> and more we would like to extract some common code that we would
>> >>>>>> like providers to use, but we refrain from it, because it will only
>> >>>>>> be actually usable 6 months after we introduce some common code.
>> >>>>>>
>> >>>>>> However, if we introduce common.util, this problem is generally
>> gone
>> >>>>>> - at the expense of more complex maintenance and cross-provider
>> >>>>> dependencies.
>> >>>>>> We should be able to add a common util method and use it in a
>> >>>>>> provider at the same time.
>> >>>>>>
>> >>>>>> Think Amazon provider using a new feature released in common.util
>> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
>> >>>>>> already for common.sql. We know how to do it, we know what to
>> avoid,
>> >>>>>> we know we cannot introduce backwards-incompatible changes, so we
>> >>>>>> have to be very clear what is and what is not a public API there,
>> We
>> >>>>>> could rather easily add tests to prevent such
>> backwards-incompatible
>> >>>>>> changes. We even have a solution for chicken-egg providers where we
>> >>>>>> need to release two providers at the same time if they depend on
>> >>>>>> each other. Generally speaking it's quite workable but adds a bit
>> of
>> >>> overhead.
>> >>>>>>
>> >>>>>> Examples that we could implement as "common.util":
>> >>>>>>
>> >>>>>> - common versioning check with cache - where multiple providers
>> >>>>>> could reuse "do we have pendulum 2"
>> >>>>>> - more complex - some date management features (we have a few like
>> >>>>>> date_ranges/round_time). But there are many more.
>> >>>>>>
>> >>>>>> I generally do not love the common "util" approach. It has a
>> >>>>>> tendency to become a bag of everything over time. but if we limit
>> it
>> >>>>>> to a set of small, fully decoupled modules where each module is
>> >>>>>> independent - it's OK. And we already have it in "airflow.util" and
>> >>>>>> we seem to be
>> >>>>> doing well.
>> >>>>>>
>> >>>>>> WDYT? Is it worth it ?
>> >>>>>>
>> >>>>>> J.
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> >>>
>> >>>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>
>>

Reply via email to