I checked the content and the PR that you attached.

The results do seem promising and I like the general idea of this approach.
But as Jarek
also mentioned on the PR:

1. Not everyone might be on the board to go all async due to certain
limitations around
access to the drivers, or corporate limitations. So, we definitely need a
way to opt-out
for the ones who aren't interested.

2. We should have a seamless fallback to sync if async doesn't work for
whatever reasons.

3. Are we going all in or are we limiting the scope to lets say
connections + variables and expanding
based on the results in the long term?

Looking forward to improvements async can bring in!

Thanks & Regards,
Amogh Desai


On Sun, Apr 7, 2024 at 3:13 AM Hussein Awala <huss...@awala.fr> wrote:

> The Metadata Database is the brain of Airflow, where all scheduling
> decisions, cross-communication, synchronization between components, and
> management via the web server, are made using this database.
>
> One option to optimize the DB queries is to merge many into a single query
> to reduce latency and overall time, but this is not always possible because
> the queries are sometimes completely independent, and it is impossible/too
> complicated to merge them. But in this case, we have another option which
> is running them concurrently since they are independent. The only way to do
> this currently is to use multithreading (the sync_to_async decorator
> creates a thread and waits for it using an asyncio coroutine), which is
> already a good start, but by using the asyncio extension for sqlalchemy we
> will be able to create thousands of lightweight coroutines with the same
> amount of resources as a few threads, which will also help to reduce
> resources consumption.
>
> A few months ago I started a PoC to add support for this extension and
> implement an asynchronous version of connections and variables to be able
> to get/set them from triggers without blocking the event loop and affecting
> the performance of the triggerer, and the result was impressive (
> https://github.com/apache/airflow/pull/36504).
>
> I see a good opportunity to improve the performance of our REST API and web
> server (for example https://github.com/apache/airflow/issues/38776),
> knowing that we can mix sync and async endpoints, which will help for a
> smooth migration.
>
> I also think that it will be possible (and very useful) to migrate some of
> our executors to a full asynchronous version to improve their performance
> (kubernetes and celery)
>
> I use the sqlalchemy asyncio extension in many personal and company
> projects, and I'm very happy with it, but I would like to hear from others
> if they have any positive or negative feedback about it.
>
> I will create a new AIP for integrating the asyncio extension of
> sqlaclhemy, and other following AIPs to migrate/support each component once
> the first one is implemented, but first, I prefer to check what the
> community and other committers think about this integration.
>

Reply via email to