Arunodoy18 opened a new pull request, #59679:
URL: https://github.com/apache/airflow/pull/59679
Fixes the `airflow db clean` command failing with a foreign key constraint
error when attempting to delete old `dag_version` records that are still
referenced by recent `task_instance` or `dag_run` records.
## Problem
When running `airflow db clean`, the command fails with:
**Root Cause:**
- `dag_version` rows are deleted based on their `created_at` timestamp
- `task_instance` rows are deleted based on their `start_date` timestamp
- A DAG created long ago but executed recently will have:
- Old `dag_version.created_at` → marked for deletion
- Recent `task_instance.start_date` → kept
- Migration `3ac9e5732b1f` changed the FK constraint to `ON DELETE RESTRICT`
- Result: Cannot delete `dag_version` because `task_instance` still
references it
**Reproduction:**
1. Have a DAG that hasn't been updated for a while (old
`dag_version.created_at`)
2. Run the DAG recently (recent `task_instance.start_date`)
3. Execute `airflow db clean --clean-before-timestamp <date>` where date is
between the two timestamps
4. Command fails with the IntegrityError above
## Solution
Modified the `_build_query()` function in `db_cleanup.py` to add special
handling for the `dag_version` table:
- Before attempting to delete a `dag_version` row, check if it's referenced
by any `task_instance` or `dag_run`
- Only delete `dag_version` rows that have NO active references, regardless
of their age
- Uses SQL EXISTS subqueries to efficiently check for references
This ensures:
- ✅ Foreign key constraints are respected
- ✅ No IntegrityError is raised
- ✅ Database integrity is maintained
- ✅ Only truly orphaned `dag_version` records are cleaned up
## Changes
**Modified Files:**
1. `airflow-core/src/airflow/utils/db_cleanup.py`
- Added `not_` import from sqlalchemy
- Modified `_build_query()` to exclude `dag_version` rows with active
references in `task_instance` or `dag_run` tables
2. `airflow-core/tests/unit/utils/test_db_cleanup.py`
- Added `test_dag_version_with_active_references_not_deleted()` test case
- Reproduces the exact scenario from the bug report
- Verifies that old `dag_version` rows with recent references are not
deleted
## Testing
**New Test Case:**
The test creates an old `dag_version` (60 days ago), a recent
`task_instance` (55 days ago) referencing it, runs cleanup with 30-day
threshold, and verifies `dag_version` is NOT deleted despite being old enough.
```bash
# Run the new test
pytest
airflow-core/tests/unit/utils/test_db_cleanup.py::TestDBCleanup::test_dag_version_with_active_references_not_deleted
-v
# Run all db_cleanup tests
pytest -v
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]