Yeah do we have concrete examples of places where asyncio would be a
non-starter? Are there enough of these examples to kill this idea? I really
don't like the idea of needing to maintain both sync and async.

On Mon, Apr 8, 2024 at 1:39 PM Hussein Awala <huss...@awala.fr> wrote:

> > we definitely need a way to opt-out for the ones who aren't interested
>
> I disagree, what I propose is to infer the async connection from the sync
> configuration using a translation method, with the possibility of providing
> the async connection configuration explicitly. This will help to completely
> migrate the REST API and web server to the async version without
> duplicating the code.
>
> > We should have a seamless fallback to sync if async doesn't work for
> whatever reasons
>
> For the async version of connections/variables, we will use the sync method
> wrapped by sync_to_async in the base class, in this case, the async methods
> will work in the custom secrets backends without any issues and users can
> override the async methods for better implementation.
>
> > are we limiting the scope to lets say connections + variables and
> expanding based on the results in the long term?
>
> This needs to be implemented step by step, the first step is to add
> integration to the different providers and DB, then implement an async
> version for the secrets backends, then migrate the REST API and web server,
> and later migrate our official executors, which will need also integrating
> other tools like kubernetes-asyncio, and async integration for celery.
>
> > I think this needs to be an all or nothing thing
>
> Here are some of the available drivers
> https://github.com/apache/airflow/pull/36504#issuecomment-1872653755, I
> have already tested one for each database, so we will have async support
> for all supported databases.
>
> > having to maintain sync and async versions of functions/features is a
> non-starter in my mind;
>
> During the migration, we will have both sync and async endpoints in the API
> and the webserver (they will be migrated one by one and not at the same
> time), but without any code duplication, in the worst case, instead of
> duplicating a method, we can use sync_to_async, and optimize it later
> after migrating all endpoints that use it.
> But for Secrets Backends, we may have some duplicated code when it is not
> possible to export it to a common method shared between sync and async
> versions.
>
> > how can we keep one codenase bit cooe with sqlite?
>
> For my PoC, I used https://github.com/omnilib/aiosqlite and it worked
> without any issues.
>
>
>
> On Mon, Apr 8, 2024 at 10:08 PM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
>
> > If nothing else, write an ugly adapter using sync_to_async?
> >
> > On Mon, Apr 8, 2024 at 1:06 PM Daniel Standish <
> > daniel.stand...@astronomer.io> wrote:
> >
> > > https://github.com/omnilib/aiosqlite maybe?
> > >
> > > On Mon, Apr 8, 2024 at 1:03 PM Scheffler Jens (XC-AS/EAE-ADA-T)
> > > <jens.scheff...@de.bosch.com.invalid> wrote:
> > >
> > >> I understand the „all-in“ approach as we were dropping MSSQL… how can
> we
> > >> keep one codenase bit cooe with sqlite? I assume we must support this
> at
> > >> least for dev setups.
> > >>
> > >> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> > >> ________________________________
> > >> From: Jarek Potiuk <ja...@potiuk.com>
> > >> Sent: Monday, April 8, 2024 8:30:18 PM
> > >> To: dev@airflow.apache.org <dev@airflow.apache.org>
> > >> Subject: Re: [DISCUSS] Asynchronous SQLAlchemy
> > >>
> > >> Yep. If we can make both Postgres and MySQL work with Async - I am
> also
> > >> all
> > >> for the "All" approach. If it means that we need to support only
> certain
> > >> drivers and certain versions of the DBs - so be it. As mentioned in my
> > >> original comments (long time ago when we had MSSQL support) - this was
> > not
> > >> really possible back then - but now, by getting rid of Mssql and if we
> > >> have
> > >> the right drivers for mysql, it should be possible - I guess.
> > >>
> > >> On Mon, Apr 8, 2024 at 8:17 PM Daniel Standish
> > >> <daniel.stand...@astronomer.io.invalid> wrote:
> > >>
> > >> > I wholeheartedly agree with Ash that it should be all or nothing.
> And
> > >> > *all* sounds
> > >> > better to me :)
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Apr 8, 2024 at 10:54 AM Ash Berlin-Taylor <a...@apache.org>
> > >> wrote:
> > >> >
> > >> > > I’m all in favour of async SQLAlchemy. We’ve built two products
> > >> > > exclusively at @ Astronomer that use sqlalchemy+psycopg3+async and
> > >> love
> > >> > it.
> > >> > > Async does take a bit of a learning curve, but SQLA has done it
> > nicely
> > >> > and
> > >> > > it works really well.
> > >> > >
> > >> > > I think this needs to be an all or nothing thing — having to
> > maintain
> > >> > sync
> > >> > > and async versions of functions/features is a non-starter in my
> > mind;
> > >> > it’d
> > >> > > just be a worryingly large amount of duplicated work. Given the
> only
> > >> DBs
> > >> > we
> > >> > > support now is postgres and mysql then I can’t think of any reason
> > >> users
> > >> > > should even care — they give it a DSN and that’s the end of their
> > >> > > involvement.
> > >> > >
> > >> > > Amogh: I don’t understand what you mean by point 3 below.
> > >> > >
> > >> > > -ash
> > >> > >
> > >> > > > On 8 Apr 2024, at 05:31, Amogh Desai <amoghdesai....@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > I checked the content and the PR that you attached.
> > >> > > >
> > >> > > > The results do seem promising and I like the general idea of
> this
> > >> > > approach.
> > >> > > > But as Jarek
> > >> > > > also mentioned on the PR:
> > >> > > >
> > >> > > > 1. Not everyone might be on the board to go all async due to
> > certain
> > >> > > > limitations around
> > >> > > > access to the drivers, or corporate limitations. So, we
> definitely
> > >> > need a
> > >> > > > way to opt-out
> > >> > > > for the ones who aren't interested.
> > >> > > >
> > >> > > > 2. We should have a seamless fallback to sync if async doesn't
> > work
> > >> for
> > >> > > > whatever reasons.
> > >> > > >
> > >> > > > 3. Are we going all in or are we limiting the scope to lets say
> > >> > > > connections + variables and expanding
> > >> > > > based on the results in the long term?
> > >> > > >
> > >> > > > Looking forward to improvements async can bring in!
> > >> > > >
> > >> > > > Thanks & Regards,
> > >> > > > Amogh Desai
> > >> > > >
> > >> > > >
> > >> > > > On Sun, Apr 7, 2024 at 3:13 AM Hussein Awala <huss...@awala.fr>
> > >> wrote:
> > >> > > >
> > >> > > >> The Metadata Database is the brain of Airflow, where all
> > scheduling
> > >> > > >> decisions, cross-communication, synchronization between
> > components,
> > >> > and
> > >> > > >> management via the web server, are made using this database.
> > >> > > >>
> > >> > > >> One option to optimize the DB queries is to merge many into a
> > >> single
> > >> > > query
> > >> > > >> to reduce latency and overall time, but this is not always
> > possible
> > >> > > because
> > >> > > >> the queries are sometimes completely independent, and it is
> > >> > > impossible/too
> > >> > > >> complicated to merge them. But in this case, we have another
> > option
> > >> > > which
> > >> > > >> is running them concurrently since they are independent. The
> only
> > >> way
> > >> > > to do
> > >> > > >> this currently is to use multithreading (the sync_to_async
> > >> decorator
> > >> > > >> creates a thread and waits for it using an asyncio coroutine),
> > >> which
> > >> > is
> > >> > > >> already a good start, but by using the asyncio extension for
> > >> > sqlalchemy
> > >> > > we
> > >> > > >> will be able to create thousands of lightweight coroutines with
> > the
> > >> > same
> > >> > > >> amount of resources as a few threads, which will also help to
> > >> reduce
> > >> > > >> resources consumption.
> > >> > > >>
> > >> > > >> A few months ago I started a PoC to add support for this
> > extension
> > >> and
> > >> > > >> implement an asynchronous version of connections and variables
> to
> > >> be
> > >> > > able
> > >> > > >> to get/set them from triggers without blocking the event loop
> and
> > >> > > affecting
> > >> > > >> the performance of the triggerer, and the result was
> impressive (
> > >> > > >>
> > >>
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F36504&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C453e379d0e284252391708dc57f9fd87%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638481978411968167%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=LfaqvpAcffa830qqp1fLdsbKuVkpgqsGOSt%2FrnQL2Wk%3D&reserved=0
> > >> )<https://github.com/apache/airflow/pull/36504>.
> > >> > > >>
> > >> > > >> I see a good opportunity to improve the performance of our REST
> > API
> > >> > and
> > >> > > web
> > >> > > >> server (for example
> > >>
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fissues%2F38776&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C453e379d0e284252391708dc57f9fd87%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638481978411976153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=FDHZsKZ6d5%2BTrfI43pVY%2BSHJ2RsMW93MpxIqidhlSoE%3D&reserved=0
> > >> )<https://github.com/apache/airflow/issues/38776>,
> > >> > > >> knowing that we can mix sync and async endpoints, which will
> help
> > >> for
> > >> > a
> > >> > > >> smooth migration.
> > >> > > >>
> > >> > > >> I also think that it will be possible (and very useful) to
> > migrate
> > >> > some
> > >> > > of
> > >> > > >> our executors to a full asynchronous version to improve their
> > >> > > performance
> > >> > > >> (kubernetes and celery)
> > >> > > >>
> > >> > > >> I use the sqlalchemy asyncio extension in many personal and
> > company
> > >> > > >> projects, and I'm very happy with it, but I would like to hear
> > from
> > >> > > others
> > >> > > >> if they have any positive or negative feedback about it.
> > >> > > >>
> > >> > > >> I will create a new AIP for integrating the asyncio extension
> of
> > >> > > >> sqlaclhemy, and other following AIPs to migrate/support each
> > >> component
> > >> > > once
> > >> > > >> the first one is implemented, but first, I prefer to check what
> > the
> > >> > > >> community and other committers think about this integration.
> > >> > > >>
> > >> > >
> > >> > >
> > >> > >
> > ---------------------------------------------------------------------
> > >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Reply via email to