benrifkind opened a new issue, #60145: URL: https://github.com/apache/airflow/issues/60145
### Apache Airflow version Other Airflow 3 version (please specify below) ### If "Other Airflow 3 version" selected, which one? 3.1.2 ### What happened? On upgrade to Airflow 3.1.2, there was a huge increase in the number of DAG versions for many of the DAGs. I believe this may have been fixed in Airflow 3.1.5 (https://github.com/apache/airflow/pull/59091). Before upgrading to get the fix, I wanted to clean up the database `airflow db clean`. This was to clean up around 360K rows. The non batch version ran for 2+ hours before I killed it. Then I tried ``` airflow db clean -t dag_version --clean-before-timestamp 2025-12-20 --batch-size 1000 ``` This was also extremely slow. Around 6 minutes per batch. The fix for me was to add (temporary) indices to the tables `task_instance` and `dag_run` ``` CREATE INDEX CONCURRENTLY idx_temp_task_instance_dag_version_id ON task_instance(dag_version_id); CREATE INDEX CONCURRENTLY idx_temp_dag_run_dag_version_id ON dag_run(created_dag_version_id); ``` The `airflow db clean` then ran extremely fast - under 2 minutes. ### What you think should happen instead? `airflow db clean` should complete relatively quickly. Adding indices to the tables `task_instance` and `dag_run` on the DAG version id led to a huge speed up so these indices should likely exist. ### How to reproduce Create a large number of DAG versions and then attempt to drop them ### Operating System Debian GNU/Linux 12 (bookworm) ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==9.16.0 apache-airflow-providers-celery==3.13.0 apache-airflow-providers-cncf-kubernetes==10.9.0 apache-airflow-providers-common-compat==1.8.0 apache-airflow-providers-common-io==1.6.4 apache-airflow-providers-common-messaging==2.0.0 apache-airflow-providers-common-sql==1.28.2 apache-airflow-providers-docker==4.4.4 apache-airflow-providers-elasticsearch==6.3.4 apache-airflow-providers-fab==3.0.1 apache-airflow-providers-ftp==3.13.2 apache-airflow-providers-git==0.0.9 apache-airflow-providers-google==18.1.0 apache-airflow-providers-grpc==3.8.2 apache-airflow-providers-hashicorp==4.3.3 apache-airflow-providers-http==5.4.0 apache-airflow-providers-microsoft-azure==12.8.0 apache-airflow-providers-mysql==6.3.4 apache-airflow-providers-odbc==4.10.2 apache-airflow-providers-openlineage==2.7.3 apache-airflow-providers-postgres==6.4.0 apache-airflow-providers-redis==4.3.2 apache-airflow-providers-sendgrid==4.1.4 apache-airflow-providers-sftp==5.4.1 apache-airflow-providers-slack==9.4.0 apache-airflow-providers-smtp==2.3.1 apache-airflow-providers-snowflake==6.6.0 apache-airflow-providers-ssh==4.1.5 apache-airflow-providers-standard==1.9.1 ### Deployment Official Apache Airflow Helm Chart ### Deployment details helm on AWS EKS ### Anything else? _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
