Fellow Airflowers, I am following up on some of the proposed changes in the Airflow 3 proposal <https://docs.google.com/document/d/1MTr53101EISZaYidCUKcR6mRKshXGzW6DZFXGzetG3E/>, where more information was requested by the community.
One specific topic was "Running Backfills at scale". This is not yet a full fledged AIP, but a starting point for the discussion leading towards an AIP with fully defined technical details. Backfills at scale Backfills in Airflow 2.x are treated as an exception and executed by an incarnation of the BackfillJob, rather than the regular Airflow Scheduler itself. This results in unexpected interactions with the other DAGs being run by the main Airflow Scheduler at the same time including resource contention and possibly unexpected delays because established scalability configuration settings such as Concurrency are not consistently applied, and also code-level complexity by having two somewhat-similar implementations of scheduling logic. However, with ML model training, backfills are a common operation and need to be treated as a regular Airflow DAG / Task execution operation and not treated as an exception. It is also not possible to run a backfill unless you have direct access to the Airflow database/SSH access to the Airflow server , which is not possible for many/most data engineers. In order for this to become a reality, Backfills need to be handled by the Airflow Scheduler as a normal DAG execution, building on the Dynamic Task Mapping execution pattern, rather than an exception. Additionally, Backfill tasks will now ONLY be executed by the Airflow Workers, for obvious reasons including scalability. A less obvious, but important reason is Security, since it is ideal to have data connections to Enterprise data only happen through Airflow Workers, rather than any Airflow system components. As part of making Backfill support cleaner in Airflow, Backfill DAG execution will also be supported in the Airflow REST API. This proposal is purposefully light on exact implementation details but will include at least: - Making the Airflow Scheduler responsible for scheduling decisions on all DagRuns (instead of the current where it purposefully ignores backfill runs) - A new API endpoint to submit a "backfill request". -- Best regards, Vikram Koka, Ash Berlin-Taylor, Kaxil Naik, and Constance Martineau