GitHub user hnomichith added a comment to the discussion: Add the ability to 
backfill a DAG based on past Asset Events

> Well - what you would use as selection criteria? Time (which we know it 
> cannot be used because it's not time-triggered)? Or what would be your 
> criteria there?

I don't get why time can't be used. As a scenario:
1. You go on your DAG, you click on "Trigger" > "Backfill", then you select 
your time period
2. An API call is made, checking which Asset Events were created in this time 
period and would have triggered the DAG. Let's say 5 are found.
3. The UI shows "5 runs will be triggered"
4. You run your DAGs, an API call is made to queue the events for the DAG

> Same as current time-range backfill. If your time-range Dag is backfilled, it 
> should (I believe) also generate asset events if they produce assets. Unless 
> you are talking about "asset-downstream Dags, not the Dag to backfill 
> -downstream Dags".

Not sure we understand, so let me describe a scenario.
- One month ago, I wrote `DAG_1` producing `asset_a`. It runs everyday.
- Today, I write `DAG_2`, triggering when `asset_a` is updated. I'd like to 
"backfill" the data (maybe "catch-up" is a better term) for the past month.
- Another month in the future, I write DAG_3, still triggering when `asset_a` 
is updated. Again, I would like to backfill it for the past two months. 
However, when backfilling DAG_3, I don't want to re-trigger DAG_2.

Whether I use a script to manually duplicate the Asset Events, or clear DAG_1 
to reproduce the events matching `asset_a`, it will trigger both `DAG_2` and 
`DAG_3`. Now that I think about it, I can modify my script to pause `DAG_2`, 
produce the events, and unpause it, but it does not sound the simplest.

GitHub link: 
https://github.com/apache/airflow/discussions/59886#discussioncomment-15374932

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to