I agree with Jarek -- running a backfill is essentially another scheduler (albeit it one with a lot of specialised logic -- how much of it is needed is unclear) so it would be nice to have that all rolled up in to the scheduler, and `airflow backfill` and the new API just sets "something" that the scheduler than looks at to start running the backfill.

(The scheduling of backfill should not be very resource intensive, no more so than normal dag runs)

-ash



On Mon, 28 Dec, 2020 at 13:23, Jarek Potiuk <[email protected]> wrote:
Made some comments. Summarising: I think we need it, and the proposal is reasonable, I only have one serious question where the Backjob should be running, My guts feeling tell me that scheduler is a better "entity" for such a job than workers as proposed in the original document, but I am happy to discuss it.

On Wed, Dec 23, 2020 at 2:45 PM Tomasz Urbaszek <[email protected] <mailto:[email protected]>> wrote:
Hello all,

 I would like to discuss the new Airflow Improvement Proposal which
 aims to give Airflow users the possibility to trigger backfill
 externally (via API or UI).

 In short this AIP proposes:
- create new API endpoint to trigger backfill job (and same mechanism
 for web ui)
 - create new celery task to run backfill on worker machines (in case
 of Celery-like executors)
 - extend scheduler mechanism of removing zombies to take care of
 backfill-triggered tasks and dagruns
- improve UI so users can see difference between scheduled and backfilled runs

I drafted a doc with the proposal: <https://s.apache.org/backfill-aip>
 so we can discuss it there before moving it to cwiki.

Happy to hear your opinion on this. And have a peaceful and warm holidays!

 Best,
 Tomek


--
Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <tel:+48660796129>
 <https://www.polidea.com/>



Reply via email to