Re: [DISCUSS] Asynchronous SQLAlchemy

Ash Berlin-Taylor Mon, 08 Apr 2024 10:54:11 -0700

I’m all in favour of async SQLAlchemy. We’ve built two products exclusively at 
@ Astronomer that use sqlalchemy+psycopg3+async and love it. Async does take a 
bit of a learning curve, but SQLA has done it nicely and it works really well.


I think this needs to be an all or nothing thing — having to maintain sync and 
async versions of functions/features is a non-starter in my mind; it’d just be 
a worryingly large amount of duplicated work. Given the only DBs we support now 
is postgres and mysql then I can’t think of any reason users should even care — 
they give it a DSN and that’s the end of their involvement.

Amogh: I don’t understand what you mean by point 3 below.

-ash

> On 8 Apr 2024, at 05:31, Amogh Desai <[email protected]> wrote:
> 
> I checked the content and the PR that you attached.
> 
> The results do seem promising and I like the general idea of this approach.
> But as Jarek
> also mentioned on the PR:
> 
> 1. Not everyone might be on the board to go all async due to certain
> limitations around
> access to the drivers, or corporate limitations. So, we definitely need a
> way to opt-out
> for the ones who aren't interested.
> 
> 2. We should have a seamless fallback to sync if async doesn't work for
> whatever reasons.
> 
> 3. Are we going all in or are we limiting the scope to lets say
> connections + variables and expanding
> based on the results in the long term?
> 
> Looking forward to improvements async can bring in!
> 
> Thanks & Regards,
> Amogh Desai
> 
> 
> On Sun, Apr 7, 2024 at 3:13 AM Hussein Awala <[email protected]> wrote:
> 
>> The Metadata Database is the brain of Airflow, where all scheduling
>> decisions, cross-communication, synchronization between components, and
>> management via the web server, are made using this database.
>> 
>> One option to optimize the DB queries is to merge many into a single query
>> to reduce latency and overall time, but this is not always possible because
>> the queries are sometimes completely independent, and it is impossible/too
>> complicated to merge them. But in this case, we have another option which
>> is running them concurrently since they are independent. The only way to do
>> this currently is to use multithreading (the sync_to_async decorator
>> creates a thread and waits for it using an asyncio coroutine), which is
>> already a good start, but by using the asyncio extension for sqlalchemy we
>> will be able to create thousands of lightweight coroutines with the same
>> amount of resources as a few threads, which will also help to reduce
>> resources consumption.
>> 
>> A few months ago I started a PoC to add support for this extension and
>> implement an asynchronous version of connections and variables to be able
>> to get/set them from triggers without blocking the event loop and affecting
>> the performance of the triggerer, and the result was impressive (
>> https://github.com/apache/airflow/pull/36504).
>> 
>> I see a good opportunity to improve the performance of our REST API and web
>> server (for example https://github.com/apache/airflow/issues/38776),
>> knowing that we can mix sync and async endpoints, which will help for a
>> smooth migration.
>> 
>> I also think that it will be possible (and very useful) to migrate some of
>> our executors to a full asynchronous version to improve their performance
>> (kubernetes and celery)
>> 
>> I use the sqlalchemy asyncio extension in many personal and company
>> projects, and I'm very happy with it, but I would like to hear from others
>> if they have any positive or negative feedback about it.
>> 
>> I will create a new AIP for integrating the asyncio extension of
>> sqlaclhemy, and other following AIPs to migrate/support each component once
>> the first one is implemented, but first, I prefer to check what the
>> community and other committers think about this integration.
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Asynchronous SQLAlchemy

Reply via email to