benrifkind opened a new issue, #60145:
URL: https://github.com/apache/airflow/issues/60145

   ### Apache Airflow version
   
   Other Airflow 3 version (please specify below)
   
   ### If "Other Airflow 3 version" selected, which one?
   
   3.1.2
   
   ### What happened?
   
   On upgrade to Airflow 3.1.2, there was a huge increase in the number of DAG 
versions for many of the DAGs. I believe this may have been fixed in Airflow 
3.1.5 (https://github.com/apache/airflow/pull/59091).
   
   Before upgrading to get the fix, I wanted to clean up the database `airflow 
db clean`. This was to clean up around 360K rows. The non batch version ran for 
2+ hours before I killed it. Then I tried
   ```
   airflow db clean -t dag_version --clean-before-timestamp 2025-12-20 
--batch-size 1000
   ```
   This was also extremely slow. Around 6 minutes per batch.
   
   The fix for me was to add (temporary) indices to the tables `task_instance` 
and `dag_run`
   ```
   CREATE INDEX CONCURRENTLY idx_temp_task_instance_dag_version_id 
     ON task_instance(dag_version_id);
   
   CREATE INDEX CONCURRENTLY idx_temp_dag_run_dag_version_id 
     ON dag_run(created_dag_version_id);
   ```
   The `airflow db clean` then ran extremely fast - under 2 minutes.
   
   
   ### What you think should happen instead?
   
   `airflow db clean` should complete relatively quickly. Adding indices to the 
tables `task_instance` and `dag_run` on the DAG version id led to a huge speed 
up so these indices should likely exist.
   
   ### How to reproduce
   
   Create a large number of DAG versions and then attempt to drop them
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==9.16.0
   apache-airflow-providers-celery==3.13.0
   apache-airflow-providers-cncf-kubernetes==10.9.0
   apache-airflow-providers-common-compat==1.8.0
   apache-airflow-providers-common-io==1.6.4
   apache-airflow-providers-common-messaging==2.0.0
   apache-airflow-providers-common-sql==1.28.2
   apache-airflow-providers-docker==4.4.4
   apache-airflow-providers-elasticsearch==6.3.4
   apache-airflow-providers-fab==3.0.1
   apache-airflow-providers-ftp==3.13.2
   apache-airflow-providers-git==0.0.9
   apache-airflow-providers-google==18.1.0
   apache-airflow-providers-grpc==3.8.2
   apache-airflow-providers-hashicorp==4.3.3
   apache-airflow-providers-http==5.4.0
   apache-airflow-providers-microsoft-azure==12.8.0
   apache-airflow-providers-mysql==6.3.4
   apache-airflow-providers-odbc==4.10.2
   apache-airflow-providers-openlineage==2.7.3
   apache-airflow-providers-postgres==6.4.0
   apache-airflow-providers-redis==4.3.2
   apache-airflow-providers-sendgrid==4.1.4
   apache-airflow-providers-sftp==5.4.1
   apache-airflow-providers-slack==9.4.0
   apache-airflow-providers-smtp==2.3.1
   apache-airflow-providers-snowflake==6.6.0
   apache-airflow-providers-ssh==4.1.5
   apache-airflow-providers-standard==1.9.1
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   helm on AWS EKS
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to