arjav1528 opened a new pull request, #60390:
URL: https://github.com/apache/airflow/pull/60390

   ## Problem
   
   When using `airflow db clean`, an error occurs when deleting rows from the 
`dag_version` table due to a violation of the 
`task_instance_dag_version_id_fkey` foreign key constraint.
   
   The issue occurs when:
   - A `dag_version` row has an old `created_at` timestamp (meets deletion 
criteria)
   - A `task_instance` row has a recent `start_date` timestamp (does NOT meet 
deletion criteria)
   - The `task_instance` row references the old `dag_version` row
   
   ## Fixes: #59474 
   
   The `db clean` command was removing rows from both tables based solely on 
their respective recency columns (`created_at` for `dag_version`, `start_date` 
for `task_instance`) without considering the foreign key relationship.
   
   ## Solution
   
   Modified `_build_query()` in `db_cleanup.py` to add special handling for the 
`dag_version` table. When building the deletion query for `dag_version`, we now:
   
   1. Create a subquery to find `dag_version_id`s that are referenced by 
`task_instance` rows with `start_date >= clean_before_timestamp` (i.e., rows 
that are NOT being deleted)
   2. Exclude those `dag_version` rows from the deletion query
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to