GitHub user hnomichith added a comment to the discussion: Add the ability to backfill a DAG based on past Asset Events
> Well - what you would use as selection criteria? Time (which we know it > cannot be used because it's not time-triggered)? Or what would be your > criteria there? I don't get why time can't be used. As a scenario: 1. You go on your DAG, you click on "Trigger" > "Backfill", then you select your time period 2. An API call is made, checking which Asset Events were created in this time period and would have triggered the DAG. Let's say 5 are found. 3. The UI shows "5 runs will be triggered" 4. You run your DAGs, an API call is made to queue the events for the DAG > Same as current time-range backfill. If your time-range Dag is backfilled, it > should (I believe) also generate asset events if they produce assets. Unless > you are talking about "asset-downstream Dags, not the Dag to backfill > -downstream Dags". Not sure we understand, so let me describe a scenario. - One month ago, I wrote `DAG_1` producing `asset_a`. It runs everyday. - Today, I write `DAG_2`, triggering when `asset_a` is updated. I'd like to "backfill" the data (maybe "catch-up" is a better term) for the past month. - Another month in the future, I write DAG_3, still triggering when `asset_a` is updated. Again, I would like to backfill it for the past two months. However, when backfilling DAG_3, I don't want to re-trigger DAG_2. Whether I use a script to manually duplicate the Asset Events, or clear DAG_1 to reproduce the events matching `asset_a`, it will trigger both `DAG_2` and `DAG_3`. Now that I think about it, I can modify my script to pause `DAG_2`, produce the events, and unpause it, but it does not sound the simplest. GitHub link: https://github.com/apache/airflow/discussions/59886#discussioncomment-15374932 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
