Thanks, Hussein and Ash, I am much clearer on the scope now and I am ok with the discussion going on in the thread so far.
Thanks & Regards, Amogh Desai On Tue, Apr 9, 2024 at 2:25 AM Andrey Anshin <andrey.ans...@taragol.is> wrote: > If I do not miss something, usage of DB is not covered by Airflow Public > Interface, in this case we could easily replace one-by-one sync methods by > async. > There is some places exists where it might be mixin, as mentioned before > Secrets Backend, but it could be done by wrapping it into the: > - sync_to_async > - asyncio.to_thread (Python 3.9+) > - One of the anyio capabilities to run sync code into the async (threads or > processes) > > > > On Tue, 9 Apr 2024 at 00:45, Daniel Imberman <daniel.imber...@gmail.com> > wrote: > > > Yeah do we have concrete examples of places where asyncio would be a > > non-starter? Are there enough of these examples to kill this idea? I > really > > don't like the idea of needing to maintain both sync and async. > > > > On Mon, Apr 8, 2024 at 1:39 PM Hussein Awala <huss...@awala.fr> wrote: > > > > > > we definitely need a way to opt-out for the ones who aren't > interested > > > > > > I disagree, what I propose is to infer the async connection from the > sync > > > configuration using a translation method, with the possibility of > > providing > > > the async connection configuration explicitly. This will help to > > completely > > > migrate the REST API and web server to the async version without > > > duplicating the code. > > > > > > > We should have a seamless fallback to sync if async doesn't work for > > > whatever reasons > > > > > > For the async version of connections/variables, we will use the sync > > method > > > wrapped by sync_to_async in the base class, in this case, the async > > methods > > > will work in the custom secrets backends without any issues and users > can > > > override the async methods for better implementation. > > > > > > > are we limiting the scope to lets say connections + variables and > > > expanding based on the results in the long term? > > > > > > This needs to be implemented step by step, the first step is to add > > > integration to the different providers and DB, then implement an async > > > version for the secrets backends, then migrate the REST API and web > > server, > > > and later migrate our official executors, which will need also > > integrating > > > other tools like kubernetes-asyncio, and async integration for celery. > > > > > > > I think this needs to be an all or nothing thing > > > > > > Here are some of the available drivers > > > https://github.com/apache/airflow/pull/36504#issuecomment-1872653755, > I > > > have already tested one for each database, so we will have async > support > > > for all supported databases. > > > > > > > having to maintain sync and async versions of functions/features is a > > > non-starter in my mind; > > > > > > During the migration, we will have both sync and async endpoints in the > > API > > > and the webserver (they will be migrated one by one and not at the same > > > time), but without any code duplication, in the worst case, instead of > > > duplicating a method, we can use sync_to_async, and optimize it later > > > after migrating all endpoints that use it. > > > But for Secrets Backends, we may have some duplicated code when it is > not > > > possible to export it to a common method shared between sync and async > > > versions. > > > > > > > how can we keep one codenase bit cooe with sqlite? > > > > > > For my PoC, I used https://github.com/omnilib/aiosqlite and it worked > > > without any issues. > > > > > > > > > > > > On Mon, Apr 8, 2024 at 10:08 PM Daniel Standish > > > <daniel.stand...@astronomer.io.invalid> wrote: > > > > > > > If nothing else, write an ugly adapter using sync_to_async? > > > > > > > > On Mon, Apr 8, 2024 at 1:06 PM Daniel Standish < > > > > daniel.stand...@astronomer.io> wrote: > > > > > > > > > https://github.com/omnilib/aiosqlite maybe? > > > > > > > > > > On Mon, Apr 8, 2024 at 1:03 PM Scheffler Jens (XC-AS/EAE-ADA-T) > > > > > <jens.scheff...@de.bosch.com.invalid> wrote: > > > > > > > > > >> I understand the „all-in“ approach as we were dropping MSSQL… how > > can > > > we > > > > >> keep one codenase bit cooe with sqlite? I assume we must support > > this > > > at > > > > >> least for dev setups. > > > > >> > > > > >> Sent from Outlook for iOS<https://aka.ms/o0ukef> > > > > >> ________________________________ > > > > >> From: Jarek Potiuk <ja...@potiuk.com> > > > > >> Sent: Monday, April 8, 2024 8:30:18 PM > > > > >> To: dev@airflow.apache.org <dev@airflow.apache.org> > > > > >> Subject: Re: [DISCUSS] Asynchronous SQLAlchemy > > > > >> > > > > >> Yep. If we can make both Postgres and MySQL work with Async - I am > > > also > > > > >> all > > > > >> for the "All" approach. If it means that we need to support only > > > certain > > > > >> drivers and certain versions of the DBs - so be it. As mentioned > in > > my > > > > >> original comments (long time ago when we had MSSQL support) - this > > was > > > > not > > > > >> really possible back then - but now, by getting rid of Mssql and > if > > we > > > > >> have > > > > >> the right drivers for mysql, it should be possible - I guess. > > > > >> > > > > >> On Mon, Apr 8, 2024 at 8:17 PM Daniel Standish > > > > >> <daniel.stand...@astronomer.io.invalid> wrote: > > > > >> > > > > >> > I wholeheartedly agree with Ash that it should be all or > nothing. > > > And > > > > >> > *all* sounds > > > > >> > better to me :) > > > > >> > > > > > >> > > > > > >> > > > > > >> > On Mon, Apr 8, 2024 at 10:54 AM Ash Berlin-Taylor < > a...@apache.org > > > > > > > >> wrote: > > > > >> > > > > > >> > > I’m all in favour of async SQLAlchemy. We’ve built two > products > > > > >> > > exclusively at @ Astronomer that use sqlalchemy+psycopg3+async > > and > > > > >> love > > > > >> > it. > > > > >> > > Async does take a bit of a learning curve, but SQLA has done > it > > > > nicely > > > > >> > and > > > > >> > > it works really well. > > > > >> > > > > > > >> > > I think this needs to be an all or nothing thing — having to > > > > maintain > > > > >> > sync > > > > >> > > and async versions of functions/features is a non-starter in > my > > > > mind; > > > > >> > it’d > > > > >> > > just be a worryingly large amount of duplicated work. Given > the > > > only > > > > >> DBs > > > > >> > we > > > > >> > > support now is postgres and mysql then I can’t think of any > > reason > > > > >> users > > > > >> > > should even care — they give it a DSN and that’s the end of > > their > > > > >> > > involvement. > > > > >> > > > > > > >> > > Amogh: I don’t understand what you mean by point 3 below. > > > > >> > > > > > > >> > > -ash > > > > >> > > > > > > >> > > > On 8 Apr 2024, at 05:31, Amogh Desai < > > amoghdesai....@gmail.com> > > > > >> wrote: > > > > >> > > > > > > > >> > > > I checked the content and the PR that you attached. > > > > >> > > > > > > > >> > > > The results do seem promising and I like the general idea of > > > this > > > > >> > > approach. > > > > >> > > > But as Jarek > > > > >> > > > also mentioned on the PR: > > > > >> > > > > > > > >> > > > 1. Not everyone might be on the board to go all async due to > > > > certain > > > > >> > > > limitations around > > > > >> > > > access to the drivers, or corporate limitations. So, we > > > definitely > > > > >> > need a > > > > >> > > > way to opt-out > > > > >> > > > for the ones who aren't interested. > > > > >> > > > > > > > >> > > > 2. We should have a seamless fallback to sync if async > doesn't > > > > work > > > > >> for > > > > >> > > > whatever reasons. > > > > >> > > > > > > > >> > > > 3. Are we going all in or are we limiting the scope to lets > > say > > > > >> > > > connections + variables and expanding > > > > >> > > > based on the results in the long term? > > > > >> > > > > > > > >> > > > Looking forward to improvements async can bring in! > > > > >> > > > > > > > >> > > > Thanks & Regards, > > > > >> > > > Amogh Desai > > > > >> > > > > > > > >> > > > > > > > >> > > > On Sun, Apr 7, 2024 at 3:13 AM Hussein Awala < > > huss...@awala.fr> > > > > >> wrote: > > > > >> > > > > > > > >> > > >> The Metadata Database is the brain of Airflow, where all > > > > scheduling > > > > >> > > >> decisions, cross-communication, synchronization between > > > > components, > > > > >> > and > > > > >> > > >> management via the web server, are made using this > database. > > > > >> > > >> > > > > >> > > >> One option to optimize the DB queries is to merge many > into a > > > > >> single > > > > >> > > query > > > > >> > > >> to reduce latency and overall time, but this is not always > > > > possible > > > > >> > > because > > > > >> > > >> the queries are sometimes completely independent, and it is > > > > >> > > impossible/too > > > > >> > > >> complicated to merge them. But in this case, we have > another > > > > option > > > > >> > > which > > > > >> > > >> is running them concurrently since they are independent. > The > > > only > > > > >> way > > > > >> > > to do > > > > >> > > >> this currently is to use multithreading (the sync_to_async > > > > >> decorator > > > > >> > > >> creates a thread and waits for it using an asyncio > > coroutine), > > > > >> which > > > > >> > is > > > > >> > > >> already a good start, but by using the asyncio extension > for > > > > >> > sqlalchemy > > > > >> > > we > > > > >> > > >> will be able to create thousands of lightweight coroutines > > with > > > > the > > > > >> > same > > > > >> > > >> amount of resources as a few threads, which will also help > to > > > > >> reduce > > > > >> > > >> resources consumption. > > > > >> > > >> > > > > >> > > >> A few months ago I started a PoC to add support for this > > > > extension > > > > >> and > > > > >> > > >> implement an asynchronous version of connections and > > variables > > > to > > > > >> be > > > > >> > > able > > > > >> > > >> to get/set them from triggers without blocking the event > loop > > > and > > > > >> > > affecting > > > > >> > > >> the performance of the triggerer, and the result was > > > impressive ( > > > > >> > > >> > > > > >> > > > > > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F36504&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C453e379d0e284252391708dc57f9fd87%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638481978411968167%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=LfaqvpAcffa830qqp1fLdsbKuVkpgqsGOSt%2FrnQL2Wk%3D&reserved=0 > > > > >> )<https://github.com/apache/airflow/pull/36504>. > > > > >> > > >> > > > > >> > > >> I see a good opportunity to improve the performance of our > > REST > > > > API > > > > >> > and > > > > >> > > web > > > > >> > > >> server (for example > > > > >> > > > > > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fissues%2F38776&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C453e379d0e284252391708dc57f9fd87%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638481978411976153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=FDHZsKZ6d5%2BTrfI43pVY%2BSHJ2RsMW93MpxIqidhlSoE%3D&reserved=0 > > > > >> )<https://github.com/apache/airflow/issues/38776>, > > > > >> > > >> knowing that we can mix sync and async endpoints, which > will > > > help > > > > >> for > > > > >> > a > > > > >> > > >> smooth migration. > > > > >> > > >> > > > > >> > > >> I also think that it will be possible (and very useful) to > > > > migrate > > > > >> > some > > > > >> > > of > > > > >> > > >> our executors to a full asynchronous version to improve > their > > > > >> > > performance > > > > >> > > >> (kubernetes and celery) > > > > >> > > >> > > > > >> > > >> I use the sqlalchemy asyncio extension in many personal and > > > > company > > > > >> > > >> projects, and I'm very happy with it, but I would like to > > hear > > > > from > > > > >> > > others > > > > >> > > >> if they have any positive or negative feedback about it. > > > > >> > > >> > > > > >> > > >> I will create a new AIP for integrating the asyncio > extension > > > of > > > > >> > > >> sqlaclhemy, and other following AIPs to migrate/support > each > > > > >> component > > > > >> > > once > > > > >> > > >> the first one is implemented, but first, I prefer to check > > what > > > > the > > > > >> > > >> community and other committers think about this > integration. > > > > >> > > >> > > > > >> > > > > > > >> > > > > > > >> > > > > > > --------------------------------------------------------------------- > > > > >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > >> > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > >