Hi,

I am very glad to see thi AIP. I am sure, this is a long wished feature in
the user community.

I am in favor of considering that one scheduler is better suited to include
both backfill and normal runs. Choices of using default or dedicated pools
are there and should remain.

I have been following the Airflow discussion for some time now. However
this is the first opportunity for me to share my thoughts. I hope that I am
able to express my line of thinking clearly.


I would like to add that in my experience, there are these scenarios ,
where we may want to execute older runs of a DAG.

- Enabling a new DAG which should start running from an older date with
latest run in parallel
- Catching up on lagging DAG runs along with having latest run taking place
in parallel
- Rerunning old DAG runs along with having latest run taking place in
parallel

I would like to consider them together, kind of `backfill` (or `old runs`
may be) and they could be benefitted from the same implementation.

Here is how I think of this to start with. As a user, I will have a new web
page dedicated to active backfill runs on Airflow UI, On this page, I can
view and control backfill runs on Airflow level. I would follow and maybe
more configuration to start a backfill run.

*DAG* - Lst of enabled DAGs

*Date Range *- `from date` cannot be before `start date` of DAG and `end
date` cannot be after next execution date

*Max backfill runs* - Default could be `Max active runs of DAG - 1` with
minimum value of `1` . If DAG has max active runs set to 1, it will require
more efforts to control how backfill and latest runs proceed together.
Options could be not allowing backfill on DAGs with max active runs as 1,
setting backfill DAG runs priority to lowest value or let it all run with
the same priority.

*Run/Pause/Stop* - Option to control backfill runs.

*Details* - Optionally showing error status in case the configuration of
backfill is not correct etc


The page will show a table of DAGs with active backfill runs and
configuration, control options and status of them. A new backfill run fails
to start if backfill is already in active on same DAG.

The backfill can be either via deploying a configuration file with the
configuration or/and can be created on UI. Allowing through UI sounds
easier to handle in a multi user environment. Allowing both options will
need more work to handle conflicting wishes of users.

I understand it might be a challenge to show backfill and latest active
runs in one view which we can try to address by showing a vertically split
table of latest DAG runs based on natural and backfill runs.

I know I have not covered priority, pool, reporting etc but if the initial
idea looks interesting, we can include it in our follow up discussions.

Kind Regards Anand

Reply via email to