GitHub user peter added a comment to the discussion: Add the ability to 
backfill a DAG based on past Asset Events

I disagree with this statement from the Airflow backfill documentation:

**“Backfill does not make sense for Dags that don’t have a time-based 
schedule.”**

I feel like the Airflow creators conflate two different and orthogonal things:

* Whether a DAG works on time/date based (partitioned) data (which our DAG does 
in this case)
* Whether a DAG is triggered on a fixed schedule or by an event

To say that just because a DAG is normally triggered by an event a backfill 
doesn’t make sense just doesn’t hold.

My use case is that I have a DAG that populates a BigQuery table and is 
triggered via an asset from an upstream DAG. The data in BigQuery is 
partitioned by date and the reason I need to backfill is that the schema of the 
BigQuery table has changed. The upstream data has not changed.

* I do not want to re-run the upstream DAG because it doesn't need backfilling 
(its data hasn't changed)
* I do not want to create asset events as those events would indicate that 
something changed in the data generated by the upstream DAG which is not the 
case and so would be misleading IMHO
* I do not want to clear out existing DAG runs as this would delete history 
that is potentially valuable

I have a Python script that invokes the Airflow REST API to create DAG runs for 
the date range I am interested in. The script creates one DAG run for each 
`logical_date` in the date range. However this is difficult since there is a 
unique constraint in the database on DAG ID and logical_date. So in order for 
this script to work I would need to get a complete list of all historic DAG 
runs and then delete all DAG runs that overlap with the DAG runs that I want to 
create. Two problems with this approach:

* It is making something that should be easy and straight forward quite complex
* I do not want to delete historic DAG run information (as I mentioned above)

I think the fundamental question underlying the discussion is this:

**Is a backfilling a one-off thing that Airflow doesn't need to support and 
that you can handle yourself with a custom script or is it something we should 
expect to happen every now and then and that Airflow should support?**


GitHub link: 
https://github.com/apache/airflow/discussions/59886#discussioncomment-15494701

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to