Any other  ideas or suggestions here? Can someone explain how the
"polypill" approach would look like, maybe? How do we imagine this working?

Just to continue this discussion - another example.

Small thing that David wanted to add for changes in some sql providers:

@contextmanager
def suppress_and_warn(*exceptions: type[BaseException]):
    """Context manager that suppresses the given exceptions and logs a
warning message."""
    try:
        yield
    except exceptions as e:
        warnings.warn(f"Exception suppressed:
{e}\n{traceback.format_exc()}", category=UserWarning)


https://github.com/apache/airflow/pull/38707/files#diff-6e1b2f961cb951d05d66d2d814ef5f6d8f8bf8f43c40fb5d40e27a031fed8dd7R115

This is a small thing - but adding it in `airflow` is problematic - because
it will only be released in 1.10, so we cannot use it in providers if we
do.
Currently - since it is used in sql providers, I suggested using
`common.sql` for that code (and add >= 1.12 for common-sql-providers for
those providers that use it). And I will write a separate email about a
proposed versioning approach there.

Do we have a good proposal on how we can solve similar things in the
future?
Do we want it at all? It has some challenges - yes it DRY's the code but it
also introduces coupling.

J.


On Sun, Mar 10, 2024 at 6:21 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Coming back to it - what about the "polypill" :)? What's different vs the
> "common.sql" way of doing it ? How do we think it can work ?
>
> On Thu, Feb 22, 2024 at 1:58 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> > The symbolic link approach seems to disregard all the external
>> providers, unless I misunderstand it.
>>
>> Not really. It just does not make it easy for the external providers to
>> use it "fast".  They can still - if they want to just manually copy those
>> utils from the latest version of Airflow if they want to use it. Almost by
>> definition, those will be small, independent modules that can be simply
>> copied as needed by whoever releases external providers - and they are also
>> free to copy any older version if they want. That is a nice feature that
>> makes them fully decoupled from the version of Airflow they are installed
>> in (same as community providers). Or - if they want they can just import
>> them from "airflow.provider_utils" - but then they have to add >= Airflow
>> 2.9 if that util appeared in Airflow 2.9 (which is the main reason we want
>> to use symbolic links - because due to our policies and promises, we do not
>> want community providers to depend on latest version of Airflow in vast
>> majority of cases.
>>
>> So this approach is also fully usable by external providers, but it
>> requires some manual effort to copy the modules to their providers.
>>
>> > I like the polypill idea. A backport provider that brings new
>> interfaces to providers without the actual functionalities.
>>
>> I would love to hear more about this, I think the "common.util" was
>> exactly the kind of polyfill approach (with its own versioning
>> complexities) but maybe I do not understand how such a polyfill provider
>> would work. Say we want to add a new "urlparse" method usable for all
>> providers. Could you explain how it would work - say:
>>
>> * we add "urlparse" in Airflow 2.9
>> * some provider wants to use it in Airflow 2.7
>>
>> What providers, with what code/interfaces we would have to release in
>> this case and what dependencies such providers that want to use it (both
>> community and Airflow should have)? I **think** that would mean exactly the
>> "common.<something>" approach we already have with "io" and "sql", but
>> maybe I do not understand it :)
>>
>> On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <t...@astronomer.io.invalid>
>> wrote:
>>
>>> I like the polypill idea. A backport provider that brings new interfaces
>>> to providers without the actual functionalities.
>>>
>>>
>>> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mobuchow...@apache.org>
>>> wrote:
>>> >
>>> >> That's why I generally do
>>> > not like the "util" approach because common packaging introduces
>>> > unnecessary coupling (you have to upgrade independent utils together).
>>> >
>>> > From my experience with releasing OpenLineage where we do things
>>> similarly:
>>> > I think that's the advantage of it, but only _if_ you can release those
>>> > together.
>>> > With "build-in" providers it makes sense, but could be burdensome if
>>> > "external"
>>> > ones would depend on that functionality.
>>> >
>>> >> I know it's not been the original idea behind providers, but - after
>>> > testing common.sql and now also having common.io, seems like more and
>>> more
>>> > we would like to extract some common code that we would like providers
>>> to
>>> > use, but we refrain from it, because it will only be actually usable 6
>>> > months after we introduce some common code.
>>> >
>>> > So, maybe better approach would be to introduce the functionality into
>>> > core,
>>> > and use common.X provider as "polyfill" (to borrow JS nomenclature)
>>> > to make sure providers could use that functionality now, where external
>>> > ones could depend on the Airflow ones?
>>> >
>>> > The symbolic link approach seems to disregard all the external
>>> providers,
>>> > unless
>>> > I misunderstand it.
>>> >
>>> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>>> >
>>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>>> Storage,
>>> >> reading SQL, etc.) should each have its own utility package, so they
>>> can be
>>> >> released independently without dependency problems popping up if we
>>> need to
>>> >> break compatibility to one purpose. But more providers are
>>> exponentially
>>> >> more difficult to maintain, so I’d settle for one utility provider
>>> for now
>>> >> and split further if needed in the future.
>>> >>
>>> >> Very much agree with this general statement. That's why I generally do
>>> >> not like the "util" approach because common packaging introduces
>>> >> unnecessary coupling (you have to upgrade independent utils
>>> together). And
>>> >> when we have a common set of things that seem to make sense to be
>>> released
>>> >> together when upgraded we should package them together in
>>> >> "common.<something concrete" (like we have with common.io and
>>> common.sql).
>>> >>
>>> >> However - in this case, I think what Jens proposed (and I am happy to
>>> try
>>> >> as well) is to attempt to use symbolic links - i.e. add the code in
>>> >> `airflow.util` but then create a symbolic link in the provider.  I
>>> tested
>>> >> it yesterday and it works as expected - i.e. such symbolic link is
>>> >> dereferenced and the provider package contains the python file, not
>>> >> symbolic link. That seems like a much more lightweight approach that
>>> will
>>> >> serve the purpose of "common.util" much better. The only thing we
>>> will have
>>> >> to take care of (and we can add it once the POC is successful) is to
>>> add
>>> >> some pre-commit protection that those kind of symbolically linked util
>>> >> modules are imported in providers, from inside of those provider, not
>>> from
>>> >> airlfow, and make sure they are "standalone" (i.e. - as you mentioned
>>> - not
>>> >> depend on anything in airflow code). We could create a new package
>>> for that
>>> >> in airlfow
>>> >> "airlfow.provider_utils" for example - and make sure (as next step)
>>> that
>>> >> anything from that package is never directly imported by any
>>> provider, and
>>> >> whenever provider uses it, it should be symbolic link inside that
>>> provider.
>>> >> That's all automatable and we can prevent mistakes via pre-commit.
>>> >>
>>> >> I think that might lead to a very lightweight approach where we
>>> introduce
>>> >> new common functionality which is immediately reusable in providers
>>> without
>>> >> the hassle of taking care about backwards compatibility, and managing
>>> the
>>> >> "common.util" provider. At the expense of a bit complex pre-commit
>>> that
>>> >> will guard the usage of it.
>>> >>
>>> >> Seems that it might be the "Eat cake and have it too" way that we've
>>> been
>>> >> looking for.
>>> >>
>>> >> J.
>>> >>
>>> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung
>>> <t...@astronomer.io.invalid>
>>> >> wrote:
>>> >>
>>> >>> It would help in the sense mentioned in previous posts, yes. But one
>>> >> thing
>>> >>> I want to point out is, for the provider to actually be helpful, it
>>> must
>>> >> be
>>> >>> treated a bit differently from normal providers, but more like a
>>> separate
>>> >>> third-party dependency. Specifically, the provider should not have a
>>> >>> dependency to Core Airflow, so it can be released and depended on
>>> more
>>> >>> flexibly.
>>> >>>
>>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>>> Storage,
>>> >>> reading SQL, etc.) should each have its own utility package, so they
>>> can
>>> >> be
>>> >>> released independently without dependency problems popping up if we
>>> need
>>> >> to
>>> >>> break compatibility to one purpose. But more providers are
>>> exponentially
>>> >>> more difficult to maintain, so I’d settle for one utility provider
>>> for
>>> >> now
>>> >>> and split further if needed in the future.
>>> >>>
>>> >>> TP
>>> >>>
>>> >>>
>>> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> >>> jens.scheff...@de.bosch.com.INVALID> wrote:
>>> >>>>
>>> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse
>>> and
>>> >>> validate URIs for datasets?
>>> >>>>
>>> >>>> Mit freundlichen Grüßen / Best regards
>>> >>>>
>>> >>>> Jens Scheffler
>>> >>>>
>>> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
>>> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen
>>> |
>>> >>> GERMANY | www.bosch.com
>>> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>> >>> jens.scheff...@de.bosch.com
>>> >>>>
>>> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>>> Markus
>>> >>> Forschner,
>>> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>>> >>>>
>>> >>>> -----Original Message-----
>>> >>>> From: Jarek Potiuk <ja...@potiuk.com>
>>> >>>> Sent: Donnerstag, 22. Februar 2024 00:53
>>> >>>> To: dev@airflow.apache.org
>>> >>>> Subject: Re: [DISCUSS] Common.util provider?
>>> >>>>
>>> >>>> Yep. It could work with symbolic links. Tested it and with flit -
>>> both
>>> >>> wheel and sdist packaged code such symbolically linked file is
>>> >> dereferenced
>>> >>> and copy of the file is added there. It could be a nice way of doing
>>> it.
>>> >>>>
>>> >>>> Maybe then worth trying next time if someone has a need?
>>> >>>>
>>> >>>> J
>>> >>>>
>>> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> >>> jens.scheff...@de.bosch.com.invalid> wrote:
>>> >>>>
>>> >>>>>>>> As of additional dependency complexity between providers
>>> actually
>>> >>>>>>>> the
>>> >>>>> additional dependency I think creates more problems than the
>>> benefit…
>>> >>>>> would be cool if there would be an option to „inline“ common code
>>> from
>>> >>>>> a single place but keep individual providers fully independent…
>>> >>>>>
>>> >>>>>> Well, we already  do a lot of inlining, so if we think we should
>>> do
>>> >>>>>> more,
>>> >>>>> we have mechanisms for that. We have  pre-commits and release
>>> commands
>>> >>>>> that do a lot of that. Pre commits are inlining scripts in
>>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>>> files
>>> >>>>> and changelogs and index documentation .rst (partially) are
>>> generated
>>> >>>>> at release documentation preparation time, pyproject.toml for
>>> >>>>> providers are generated from common templates at package building
>>> time
>>> >>>>> and so on and so on :). So we can do more of that and generate
>>> common
>>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>>> But
>>> >>>>> again "can't have and eat cake" - this has the drawback that there
>>> are
>>> >>>>> extra steps involved and even if it's automated it does add
>>> friction
>>> >>>>> when you have to regenerate the code every time you change it and
>>> when
>>> >>>>> you change it in another place than where you use it.
>>> >>>>>
>>> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we
>>> really
>>> >>>>> in-line and have a pre-commit aligning the redundancy of python in
>>> >>> folders.
>>> >>>>> Might need to be an opt-in if only 10 of 85 providers are using
>>> common
>>> >>>>> stuff and if we change a common line we probably do not need to
>>> affect
>>> >>>>> all providers.
>>> >>>>>
>>> >>>>> As long as no Windows users trying to code for airflow (do we need
>>> to
>>> >>>>> consider?) would it also work to use symlinks? Git can cope with
>>> this,
>>> >>>>> I don't know if the python toolchain can de-reference a copy and
>>> are
>>> >>>>> not packaging a symlink? Would be worth a test... would save the
>>> >>>>> pre-commit and we even could selectively include common bla into
>>> >>>>> providers as needed :-D
>>> >>>>>
>>> >>>>> Mit freundlichen Grüßen / Best regards
>>> >>>>>
>>> >>>>> Jens Scheffler
>>> >>>>>
>>> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
>>> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
>>> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>> >>>>> jens.scheff...@de.bosch.com
>>> >>>>>
>>> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>>> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer,
>>> Dr.
>>> >>>>> Tanja Rückert
>>> >>>>>
>>> >>>>> -----Original Message-----
>>> >>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18
>>> >>>>> To: dev@airflow.apache.org
>>> >>>>> Subject: Re: [DISCUSS] Common.util provider?
>>> >>>>>
>>> >>>>>> if we have a common piece then we are locking all depending
>>> >>>>>> providers
>>> >>>>> (potentially) together if common code changes
>>> >>>>>
>>> >>>>> Yes, coupling in this case is the drawback of this solution. You
>>> can't
>>> >>>>> have cake and eat it too and in this case you trade DRY with
>>> coupling.
>>> >>>>>
>>> >>>>>> As of additional dependency complexity between providers actually
>>> >>>>>> the
>>> >>>>> additional dependency I think creates more problems than the
>>> benefit…
>>> >>>>> would be cool if there would be an option to „inline“ common code
>>> from
>>> >>>>> a single place but keep individual providers fully independent…
>>> >>>>>
>>> >>>>> Well, we already  do a lot of inlining, so if we think we should do
>>> >>>>> more, we have mechanisms for that. We have  pre-commits and release
>>> >>>>> commands that do a lot of that. Pre commits are inlining scripts in
>>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>>> files
>>> >>>>> and changelogs and index documentation .rst (partially) are
>>> generated
>>> >>>>> at release documentation preparation time, pyproject.toml for
>>> >>>>> providers are generated from common templates at package building
>>> time
>>> >>>>> and so on and so on :). So we can do more of that and generate
>>> common
>>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>>> But
>>> >>>>> again "can't have and eat cake" - this has the drawback that there
>>> are
>>> >>>>> extra steps involved and even if it's automated it does add
>>> friction
>>> >>>>> when you have to regenerate the code every time you change it and
>>> when
>>> >>>>> you change it in another place than where you use it.
>>> >>>>>
>>> >>>>> J.
>>> >>>>>
>>> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> >>>>> jens.scheff...@de.bosch.com.invalid> wrote:
>>> >>>>>
>>> >>>>>> Hi Jarek,
>>> >>>>>>
>>> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the
>>> feeling
>>> >>>>>> that a lot of very similar code is repeated in all the providers.
>>> >>>>>> But during review yesterday I dropped the ides because if we have
>>> a
>>> >>>>>> common piece then we are locking all depending providers
>>> >>>>>> (potentially) together if common code changes.
>>> >>>>>> As of additional dependency complexity between providers actually
>>> >>>>>> the additional dependency I think creates more prblems than the
>>> >>>>>> benefit… would be cool if tehere would be an option to „inline“
>>> >>>>>> common code from a single place but keep individual providers
>>> fully
>>> >>>>>> independent…
>>> >>>>>>
>>> >>>>>> Jens
>>> >>>>>>
>>> >>>>>> Sent from Outlook for
>>> >>>>>> iOS<
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> >>>>>> 2F
>>> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com
>>> %7C98c88
>>> >>>>>> 97
>>> >>>>>>
>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
>>> >>>>>> %7
>>> >>>>>>
>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
>>> >>>>>> Ij
>>> >>>>>>
>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
>>> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
>>> >>>>>> ________________________________
>>> >>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
>>> >>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org>
>>> >>>>>> Subject: [DISCUSS] Common.util provider?
>>> >>>>>>
>>> >>>>>> Hello everyone,
>>> >>>>>>
>>> >>>>>> How do we feel about introducing a common.util provider?
>>> >>>>>>
>>> >>>>>> I know it's not been the original idea behind providers, but -
>>> after
>>> >>>>>> testing common.sql and now also having common.io, seems like more
>>> >>>>>> and more we would like to extract some common code that we would
>>> >>>>>> like providers to use, but we refrain from it, because it will
>>> only
>>> >>>>>> be actually usable 6 months after we introduce some common code.
>>> >>>>>>
>>> >>>>>> However, if we introduce common.util, this problem is generally
>>> gone
>>> >>>>>> - at the expense of more complex maintenance and cross-provider
>>> >>>>> dependencies.
>>> >>>>>> We should be able to add a common util method and use it in a
>>> >>>>>> provider at the same time.
>>> >>>>>>
>>> >>>>>> Think Amazon provider using a new feature released in common.util
>>> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
>>> >>>>>> already for common.sql. We know how to do it, we know what to
>>> avoid,
>>> >>>>>> we know we cannot introduce backwards-incompatible changes, so we
>>> >>>>>> have to be very clear what is and what is not a public API there,
>>> We
>>> >>>>>> could rather easily add tests to prevent such
>>> backwards-incompatible
>>> >>>>>> changes. We even have a solution for chicken-egg providers where
>>> we
>>> >>>>>> need to release two providers at the same time if they depend on
>>> >>>>>> each other. Generally speaking it's quite workable but adds a bit
>>> of
>>> >>> overhead.
>>> >>>>>>
>>> >>>>>> Examples that we could implement as "common.util":
>>> >>>>>>
>>> >>>>>> - common versioning check with cache - where multiple providers
>>> >>>>>> could reuse "do we have pendulum 2"
>>> >>>>>> - more complex - some date management features (we have a few like
>>> >>>>>> date_ranges/round_time). But there are many more.
>>> >>>>>>
>>> >>>>>> I generally do not love the common "util" approach. It has a
>>> >>>>>> tendency to become a bag of everything over time. but if we limit
>>> it
>>> >>>>>> to a set of small, fully decoupled modules where each module is
>>> >>>>>> independent - it's OK. And we already have it in "airflow.util"
>>> and
>>> >>>>>> we seem to be
>>> >>>>> doing well.
>>> >>>>>>
>>> >>>>>> WDYT? Is it worth it ?
>>> >>>>>>
>>> >>>>>> J.
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> ---------------------------------------------------------------------
>>> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>
>>>

Reply via email to