In my mind, there are two separate discussions here. 1. Locking and deadlocks This I believe will get better as part of the set of changes in this AIP as well as AIP-72. Of course, pending the "mini-scheduler" discussion. I am personally in two minds about that, having been both a proponent and a fan of the "mini-scheduler" during it's original implementation. I do understand the tradeoff there now with the current changes envisioned for AIP-72. Will wait for Ash to comment on that part.
2. Prioritization and sequencing I struggled with this too. When I originally started writing this up, I did write up the concept of capacity / priority of regular DAG runs vs. backfill DAG runs. However, there was significant feedback that it was a separate concept from the core backfill change proposed here. And, I agreed with that feedback. I do believe we can add a concept around capacity of backfill DAG runs vs. scheduled DAG runs vs. manually triggered DAG runs if there is sufficient interest and need for them. However, I also agree [based on the prior referenced feedback] that it could and probably should be a different change than the core Backfill change proposed here. On Fri, Jul 12, 2024 at 10:32 AM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: > > > > It's somehow related (but yes -it's more AIP-72 question). Mini scheduler > > currently **actually** attempts to lock the DagRun table when it runs - > > this is precisely what has been recently made as "optionally skipped" > when > > mini-scheduler could not obtain the lock immediately - because it wreak > all > > kind of havoc with mapped tasks: > > https://github.com/apache/airflow/pull/39745 , and this is what backfill > > scheduling will also attempt to lock - so I think it's very much related > to > > how this plays "together" > > > Yeah, mini-scheduler locks dag run table so that there aren't two "things" > trying to schedule tasks for the same dag run. And the problem was, before > we fixed it, they would wait, possibly a long time, to obtain the lock. > And they don't anymore. But anyway, all of this is true now of both > "normal" and "backfill" tasks and will remain so and I don't think there's > much interaction with this AIP. But perhaps more importantly, I suspect > mini scheduler actually goes away in airflow 3. But let's see. > > You mention sequencing. With the old way, it would loop through, create up > to `max_active_dag_runs` runs, and then wait untill *all* of those tasks > were complete before scheduling more dag runs. Now we'll be able to be > more flexible, and e.g. create more as we go along, i.e. as one run > finishes, create one more. I think it might make sense to just put a limit > that the number of RUNNING backfill dag runs may not be more than > `max_active_runs - 1`. This would ensure that there would always be room > for 1 "normally scheduled" dag run. It seems like a promising idea but > still sort of noodling on it. > > I did add a little language about scheduler process in the doc. If there > are still parts of your feedback that need attention let me know and I'll > try to address somehow. >