It is a complication, but it seems as we can't do any better and remain scalable. In the end we want priorities enforced (mb not in the way they're implemented today, but it's part of another talk), and we don't know how many tasks we'll have to iterate over in advance, so fetching them into python is a death sentence in some situations (not joking, tried that with fetchmany and chunked streaming, it was way too slow).
I actually thought of another optimization here: Instead of fetching the entire TI relation, we can ignore mapped tasks and only fetch individual tasks (operators), expanding them on the fly into the maximum number of TIs that can be created. And yet this approach is not scalable, as some enormous backfill of a DAG with just 10 tasks will make it fetch MBs of data every time. It's very slow and loads the DB server with heavy network requests. Well, it's not just about throughput, but starvation of tasks that can't run for hours sometimes, and unfortunately we encounter this in production very often. On Wednesday, September 24th, 2025 at 3:10 AM, Matthew Phillips <[email protected]> wrote: > Hi, > This seems like a significant level of technical complication/debt relative > to even a 1.5x/2x gain (which as noted is only in certain workloads). > Given airflow scheduling code in general is not something one could > describe as simple, introducing large amounts of static code that lives in > stored procs seems unwise.If at all possible making this > interface pluggable and provided via provider would be the saner approach > in my opinion. > > On Tue, Sep 23, 2025 at 11:16 AM asquator [email protected] wrote: > > > Hello, > > > > A new approach utilizing stored SQL functions is proposed here as a > > solution to unnecessary processing/starvation: > > > > https://github.com/apache/airflow/pull/55537 > > > > Benchmarks show an actual improvement in queueing throughput, around > > 1.5x-2x for the workloads tested. > > > > Regarding DB server load, I wasn't able to note any difference so far, we > > probably have to run heavier jobs to test that. Looks like a smooth line to > > me. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
