I agree with Jarek -- running a backfill is essentially another
scheduler (albeit it one with a lot of specialised logic -- how much of
it is needed is unclear) so it would be nice to have that all rolled up
in to the scheduler, and `airflow backfill` and the new API just sets
"something" that the scheduler than looks at to start running the
backfill.
(The scheduling of backfill should not be very resource intensive, no
more so than normal dag runs)
-ash
On Mon, 28 Dec, 2020 at 13:23, Jarek Potiuk <[email protected]>
wrote:
Made some comments. Summarising: I think we need it, and the proposal
is reasonable, I only have one serious question where the Backjob
should be running, My guts feeling tell me that scheduler is a better
"entity" for such a job than workers as proposed in the original
document, but I am happy to discuss it.
On Wed, Dec 23, 2020 at 2:45 PM Tomasz Urbaszek <[email protected]
<mailto:[email protected]>> wrote:
Hello all,
I would like to discuss the new Airflow Improvement Proposal which
aims to give Airflow users the possibility to trigger backfill
externally (via API or UI).
In short this AIP proposes:
- create new API endpoint to trigger backfill job (and same
mechanism
for web ui)
- create new celery task to run backfill on worker machines (in case
of Celery-like executors)
- extend scheduler mechanism of removing zombies to take care of
backfill-triggered tasks and dagruns
- improve UI so users can see difference between scheduled and
backfilled runs
I drafted a doc with the proposal:
<https://s.apache.org/backfill-aip>
so we can discuss it there before moving it to cwiki.
Happy to hear your opinion on this. And have a peaceful and warm
holidays!
Best,
Tomek
--
Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer
M: +48 660 796 129 <tel:+48660796129>
<https://www.polidea.com/>