Ok so, I'm thinking through what makes sense re concurrency control in
backfill.

It was referred to
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627729#AIP78Schedulermanagedbackfill-Otherideasunderconsideration>
in the AIP but I didn't define the behavior:

Other ideas under consideration
>
>    - Add extra concurrency control on dag run
>
>
>    - Apply max active dag runs separately for backfill
>
>
>    - Override any dag param in creating the backfill job and it’s only
>    applied in that scope
>
>
>
As I have proceeded with implementation, here's what I went with:

Each "backfill" gets its own concurrency control ("max_active_runs") that
is evaluated completely separate from the DAG scope max_active_runs

So if DAG max active runs is 2, and the backfill max active runs is 1, then
you can have max of 3 concurrent runs.  Your non-backfill dags cannot
starve out the backfill ones, and backfill dag runs cannot starve out the
non-backfill ones.

The other way to go is to say that DAG.max_active_runs is global.  This
does not feel quite right to me cus it gets a bit murky.  E.g. what happens
if DAG.max is 10 and Backfill.max is 10.  Do you allow it?  What do you do
to avoid starving out non-backfill runs?

What do people think?  Relevant PR is here
<https://github.com/apache/airflow/pull/42686>.

Reply via email to