Thank you all for the feedback. Will add more of the technical details and share as a next step.
On Tue, May 28, 2024 at 12:33 AM Amogh Desai <amoghdesai....@gmail.com> wrote: > Good proposal! > > I like the idea here but again talking in terms of timelines, do we make it > in Airflow 2 if it's that critical or can it wait till Airflow 3? I think > we should scope this out and add some technical data to back this up before > making this an AIP. > > Thanks & Regards, > Amogh Desai > > > On Sun, May 26, 2024 at 4:25 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > Yes. Long time awaited - and indeed some implementation details would be > > needed to get it to AIP. And I also think one important decision to > > consider - should it be targeting Airflow 2? > > > > On Sun, May 26, 2024 at 12:26 PM Elad Kalif <elad...@apache.org> wrote: > > > > > > In order for this to become a reality, Backfills need to be handled > by > > > the > > > Airflow Scheduler as a normal DAG execution > > > > > > I think it's a good idea. > > > It should solve natively problems like > > > https://github.com/apache/airflow/issues/11302 > > > > > > On Fri, May 24, 2024 at 10:58 PM Vikram Koka > > <vik...@astronomer.io.invalid > > > > > > > wrote: > > > > > > > Fellow Airflowers, > > > > > > > > I am following up on some of the proposed changes in the Airflow 3 > > > proposal > > > > < > > > > > > > > > > https://docs.google.com/document/d/1MTr53101EISZaYidCUKcR6mRKshXGzW6DZFXGzetG3E/ > > > > >, > > > > where more information was requested by the community. > > > > > > > > One specific topic was "Running Backfills at scale". This is not yet > a > > > full > > > > fledged AIP, but a starting point for the discussion leading towards > an > > > AIP > > > > with fully defined technical details. > > > > Backfills at scale > > > > > > > > Backfills in Airflow 2.x are treated as an exception and executed by > an > > > > incarnation of the BackfillJob, rather than the regular Airflow > > Scheduler > > > > itself. This results in unexpected interactions with the other DAGs > > being > > > > run by the main Airflow Scheduler at the same time including resource > > > > contention and possibly unexpected delays because established > > scalability > > > > configuration settings such as Concurrency are not consistently > > applied, > > > > and also code-level complexity by having two somewhat-similar > > > > implementations of scheduling logic. > > > > > > > > > > > > However, with ML model training, backfills are a common operation and > > > need > > > > to be treated as a regular Airflow DAG / Task execution operation and > > not > > > > treated as an exception. It is also not possible to run a backfill > > unless > > > > you have direct access to the Airflow database/SSH access to the > > Airflow > > > > server , which is not possible for many/most data engineers. > > > > > > > > > > > > In order for this to become a reality, Backfills need to be handled > by > > > the > > > > Airflow Scheduler as a normal DAG execution, building on the Dynamic > > Task > > > > Mapping execution pattern, rather than an exception. Additionally, > > > Backfill > > > > tasks will now ONLY be executed by the Airflow Workers, for obvious > > > reasons > > > > including scalability. A less obvious, but important reason is > > Security, > > > > since it is ideal to have data connections to Enterprise data only > > happen > > > > through Airflow Workers, rather than any Airflow system components. > > > > > > > > > > > > As part of making Backfill support cleaner in Airflow, Backfill DAG > > > > execution will also be supported in the Airflow REST API. > > > > > > > > > > > > This proposal is purposefully light on exact implementation details > but > > > > will include at least: > > > > > > > > > > > > > > > > - > > > > > > > > Making the Airflow Scheduler responsible for scheduling decisions > on > > > all > > > > DagRuns (instead of the current where it purposefully ignores > > backfill > > > > runs) > > > > - > > > > > > > > A new API endpoint to submit a "backfill request". > > > > > > > > > > > > -- > > > > > > > > > > > > Best regards, > > > > Vikram Koka, Ash Berlin-Taylor, Kaxil Naik, and Constance Martineau > > > > > > > > > >