I’m all in favour of async SQLAlchemy. We’ve built two products exclusively at @ Astronomer that use sqlalchemy+psycopg3+async and love it. Async does take a bit of a learning curve, but SQLA has done it nicely and it works really well.
I think this needs to be an all or nothing thing — having to maintain sync and async versions of functions/features is a non-starter in my mind; it’d just be a worryingly large amount of duplicated work. Given the only DBs we support now is postgres and mysql then I can’t think of any reason users should even care — they give it a DSN and that’s the end of their involvement. Amogh: I don’t understand what you mean by point 3 below. -ash > On 8 Apr 2024, at 05:31, Amogh Desai <[email protected]> wrote: > > I checked the content and the PR that you attached. > > The results do seem promising and I like the general idea of this approach. > But as Jarek > also mentioned on the PR: > > 1. Not everyone might be on the board to go all async due to certain > limitations around > access to the drivers, or corporate limitations. So, we definitely need a > way to opt-out > for the ones who aren't interested. > > 2. We should have a seamless fallback to sync if async doesn't work for > whatever reasons. > > 3. Are we going all in or are we limiting the scope to lets say > connections + variables and expanding > based on the results in the long term? > > Looking forward to improvements async can bring in! > > Thanks & Regards, > Amogh Desai > > > On Sun, Apr 7, 2024 at 3:13 AM Hussein Awala <[email protected]> wrote: > >> The Metadata Database is the brain of Airflow, where all scheduling >> decisions, cross-communication, synchronization between components, and >> management via the web server, are made using this database. >> >> One option to optimize the DB queries is to merge many into a single query >> to reduce latency and overall time, but this is not always possible because >> the queries are sometimes completely independent, and it is impossible/too >> complicated to merge them. But in this case, we have another option which >> is running them concurrently since they are independent. The only way to do >> this currently is to use multithreading (the sync_to_async decorator >> creates a thread and waits for it using an asyncio coroutine), which is >> already a good start, but by using the asyncio extension for sqlalchemy we >> will be able to create thousands of lightweight coroutines with the same >> amount of resources as a few threads, which will also help to reduce >> resources consumption. >> >> A few months ago I started a PoC to add support for this extension and >> implement an asynchronous version of connections and variables to be able >> to get/set them from triggers without blocking the event loop and affecting >> the performance of the triggerer, and the result was impressive ( >> https://github.com/apache/airflow/pull/36504). >> >> I see a good opportunity to improve the performance of our REST API and web >> server (for example https://github.com/apache/airflow/issues/38776), >> knowing that we can mix sync and async endpoints, which will help for a >> smooth migration. >> >> I also think that it will be possible (and very useful) to migrate some of >> our executors to a full asynchronous version to improve their performance >> (kubernetes and celery) >> >> I use the sqlalchemy asyncio extension in many personal and company >> projects, and I'm very happy with it, but I would like to hear from others >> if they have any positive or negative feedback about it. >> >> I will create a new AIP for integrating the asyncio extension of >> sqlaclhemy, and other following AIPs to migrate/support each component once >> the first one is implemented, but first, I prefer to check what the >> community and other committers think about this integration. >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
