yxiao1996 opened a new issue, #29479:
URL: https://github.com/apache/airflow/issues/29479

   ### Description
   
   My team has an airflow stack orchestrating many of our ETL jobs. Recently, 
we noticed a bug in one of our hourly job's code causing a large number of 
hourly datasets to be malformed(more than 500). Thus, after fixing the ETL job, 
we have the need to rerun all the impacted DAG runs to correct the data.
   
   I learnt that it's possible to batch initiate DAG reruns through the 
"Browse" -> "Task Instances" view, where I can filter out the task instances I 
want to rerun through execution date and dag Id and simply clear their states. 
However, since we set a concurrency limit for DAG runs, after we initiate the 
reruns, it appears that airflow is prioritizing reruns over latest runs, so out 
latest DAG runs got delayed. As we have a agreement with our consumers on data 
processing time, we don't want the latest DAG runs to get delayed. 
   
   I'm wondering if it's possible for Airflow to support a more intuitive and 
safe way to rerun DAG's in batch, where reruns only start if they don't impact 
latest runs.
   
   ### Use case/motivation
   
   Backfill/rerun is a common use case for ETL workflows. I feel it could be 
valuable to make it safe and more intuitive.
   
   ### Related issues
   
   Didn't find related issues after some simple searches. Please let me know if 
this is discussed somewhere else before.
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to