arjav1528 opened a new pull request, #60390: URL: https://github.com/apache/airflow/pull/60390
## Problem When using `airflow db clean`, an error occurs when deleting rows from the `dag_version` table due to a violation of the `task_instance_dag_version_id_fkey` foreign key constraint. The issue occurs when: - A `dag_version` row has an old `created_at` timestamp (meets deletion criteria) - A `task_instance` row has a recent `start_date` timestamp (does NOT meet deletion criteria) - The `task_instance` row references the old `dag_version` row ## Fixes: #59474 The `db clean` command was removing rows from both tables based solely on their respective recency columns (`created_at` for `dag_version`, `start_date` for `task_instance`) without considering the foreign key relationship. ## Solution Modified `_build_query()` in `db_cleanup.py` to add special handling for the `dag_version` table. When building the deletion query for `dag_version`, we now: 1. Create a subquery to find `dag_version_id`s that are referenced by `task_instance` rows with `start_date >= clean_before_timestamp` (i.e., rows that are NOT being deleted) 2. Exclude those `dag_version` rows from the deletion query -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
