GitHub user potiuk edited a comment on the discussion: Add the ability to backfill a DAG based on past Asset Events
> An API call is made, checking which Asset Events were created in this time > period and would have triggered the DAG. Let's say 5 are found. Backfill does not use "when asset or dag run were created" - backfill works on "data intervals". For backfill, It does not matter when your dag was triggered, it is important what "data interval range" it covers (and this is derived from schedule - each scheduled dag has "interval-start" and "interval end" - and this is automatically derived from schedule, and those "interval ranges" are non-overlapping. I.e. if you select backfil for last 28 days covering 4 weekly dagruns - those 4 dagruns will be backfilled - i.e. "whole month". So when you select backfill dates you do not select "when the dagruns were triggered" but "what data you are backfilling". For example - if I backfil May 2025 today, - all the dagruns that concern "May 2025" data interval will be back-filled. Which means that if there were dag runs in May 2025, they will be effectively replacung the runs from "May 2025", but they will be triggered in December 2025. If I backfill the same May 2025 data a year from now (December 2026), then the backfill dagruns triggered in DECEMBER 2025 will be replaced with the new dag runs - even if they were TRIGGERED in December 2025 - because those dagruns refer to "data intervals" of May 2025. In your case you have no relation between "data intervals" and "when your asset was triggered". Asset does not have a concept of "data interval" at all. Asset triggering a run is triggering a run, but this run is not associated with any data interval. This is very succintly described in the backfill documentation: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/backfill.html > Backfill does not make sense for Dags that don’t have a time-based schedule. Logically what you are asking for is ability to "clear" certain, selected dag runs. In your case you think that "when the dag was triggered by an asset" is the "trigger date" criteria that you want to use. But - this is not a back-fill, because backfill operates on "data intervals" not "triggering dates". So what you really want is to: * have some criteria to select dag runs (in your case the criteria is "triggered between date x and date y") * select dagRuns that match the criteria * clear them This is **not** backfill - at least not as we define the word "backill" - it's simply "clearing selected dagruns". And they way how backfill is implemented when it operates on adjacent, non-overlapping data intervals and not "trigger dates" - repurposing backfill to do what you want makes very little sense and is terribly confusing for those who understand backfill as "back-filling data intervals". So .. if you want to do it now you have two options: * write your script where you select dag runs and clear them in Python (you can do it TODAY) * contribute to https://github.com/apache/airflow/issues/50396 which is a feature to add where you should be able to apply various selection criteria on Dag Runs and Task Instances, multi-select them and perform actions on bulk of those (for example you could filter them by creation date, select all and apply CLEAR action on all of them). You are most welcome to contribute such a feature, and make sure that both selection criteria and UI will be fulfilling your request - but that requires someone (you or someone else) to actually work on it and contribute it. GitHub link: https://github.com/apache/airflow/discussions/59886#discussioncomment-15378113 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
