Re: [DISCUSS] AIP-78 scheduler-managed backfill

Vikram Koka Fri, 12 Jul 2024 16:04:38 -0700

In my mind, there are two separate discussions here.

1. Locking and deadlocks
This I believe will get better as part of the set of changes in this AIP as
well as AIP-72.
Of course, pending the "mini-scheduler" discussion.
I am personally in two minds about that, having been both a proponent and a
fan of the "mini-scheduler" during it's original implementation.
I do understand the tradeoff there now with the current changes envisioned
for AIP-72. Will wait for Ash to comment on that part.


2. Prioritization and sequencing
I struggled with this too.
When I originally started writing this up, I did write up the concept of
capacity / priority of regular DAG runs vs. backfill DAG runs.
However, there was significant feedback that it was a separate concept from
the core backfill change proposed here. And, I agreed with that feedback.

I do believe we can add a concept around capacity of backfill DAG runs vs.
scheduled DAG runs vs. manually triggered DAG runs if there is sufficient
interest and need for them. However, I also agree [based on the prior
referenced feedback] that it could and probably should be a different
change than the core Backfill change proposed here.


On Fri, Jul 12, 2024 at 10:32 AM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> >
> > It's somehow related (but yes -it's more AIP-72 question). Mini scheduler
> > currently **actually** attempts to lock the DagRun table when it runs -
> > this is precisely what has been recently made as "optionally skipped"
> when
> > mini-scheduler could not obtain the lock immediately - because it wreak
> all
> > kind of havoc with mapped tasks:
> > https://github.com/apache/airflow/pull/39745 , and this is what backfill
> > scheduling will also attempt to lock - so I think it's very much related
> to
> > how this plays "together"
>
>
> Yeah, mini-scheduler locks dag run table so that there aren't two "things"
> trying to schedule tasks for the same dag run.  And the problem was, before
> we fixed it, they would wait, possibly a long time, to obtain the lock.
> And they don't anymore.  But anyway, all of this is true now of both
> "normal" and "backfill" tasks and will remain so and I don't think there's
> much interaction with this AIP.  But perhaps more importantly, I suspect
> mini scheduler actually goes away in airflow 3.  But let's see.
>
> You mention sequencing.  With the old way, it would loop through, create up
> to `max_active_dag_runs` runs, and then wait untill *all* of those tasks
> were complete before scheduling more dag runs.  Now we'll be able to be
> more flexible, and e.g. create more as we go along, i.e. as one run
> finishes, create one more.  I think it might make sense to just put a limit
> that the number of RUNNING backfill dag runs may not be more than
> `max_active_runs - 1`.  This would ensure that there would always be room
> for 1 "normally scheduled" dag run.  It seems like a promising idea but
> still sort of noodling on it.
>
> I did add a little language about scheduler process in the doc.  If there
> are still parts of your feedback that need attention let me know and I'll
> try to address somehow.
>

Re: [DISCUSS] AIP-78 scheduler-managed backfill

Reply via email to