NickYadance opened a new issue, #23664:
URL: https://github.com/apache/airflow/issues/23664
### Description
The configuration option `max_active_runs_per_dag` defines the max dagrun
for a DAG. By default Airflow will spawn 16 dagruns if should. But when all 16
dagruns have been spawned, they are stoping previous dagruns from retry cuz
there is not enough room.
A simple example is that say i define DAG with `catchUp` and
`depends_on_past`.
```python
with DAG(
'test',
default_args={
'depends_on_past': True,
'retries': 0,
},
description='Max dagrun limitation should not stop failed dagrun
from retry',
schedule_interval=timedelta(hours=1),
start_date=pendulum.datetime(2022, 5, 11, tz='Asia/Singapore'),
catchup=True,
tags=['example'],
) as dag:
task_error = BashOperator(task_id='error',
bash_command='error')
task_error
```
So Airflow will spawn all 16 dagrun for me:
<img width="987" alt="image"
src="https://user-images.githubusercontent.com/10060849/168016417-c2cd6185-cc29-43fe-922d-8fcb8e05077d.png">
After the first task fails the other 16 dagruns are just sitting there
waiting for the first dagrun to succeed.
But first dagrun retry will not work as it stays being queued without room
to run:
<img width="984" alt="image"
src="https://user-images.githubusercontent.com/10060849/168016932-a32a9e5e-9f9f-416d-95cd-365dfa21feb9.png">
A real life example is that when dagrun queue is full:
1. mark the latest dagrun to success to make up room.
2. clear the failed dagrun to retry.
3. clear the latest dagrun to rerun.
After step 2, it happens that another dagrun is kicked and the room is full
again. Then i have to mark the newest dagrun to success and rerun the dagrun in
step 1. In worst condition, the rerun loop just continues going and cannot be
stopped.
### Use case/motivation
Maybe the dagrun retried, which is triggered by clearing state, should not
count into `max_active_runs_per_dag`. And the retried dagrun has its own max
number limitation.
### Related issues
_No response_
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]