steveahnahn opened a new pull request, #68595: URL: https://github.com/apache/airflow/pull/68595
Bounds the scheduler's cleanup of orphaned `asset_state_store` rows so it can no longer issue a single unbounded delete inside the scheduler loop. ### Why `SchedulerJobRunner._cleanup_orphaned_asset_state_store()` issued one bulk `DELETE` for every `asset_state_store` row whose asset is no longer active. It runs from `_update_asset_orphanage`, a `timers.call_regular_interval(parsing_cleanup_interval, ...)` scheduler callback, so a large orphan backlog — bulk asset/Dag removal, a mass asset-identity change, or the first cleanup after a backlog accumulates — made one tick do unbounded transaction and row-lock work, holding locks for the whole transaction and stalling the scheduler main loop. This is the pattern the contributing guidelines call out: bulk `DELETE`/`UPDATE` in the scheduler loop must be bounded. ### What changed - Select up to `ORPHANED_ASSET_STATE_STORE_CLEANUP_BATCH_SIZE` (500, mirroring the neighbouring `MAX_PARTITION_DAG_RUNS_PER_LOOP`) distinct orphaned `asset_id`s and delete those assets' rows via a single-column `asset_id IN (...)`. Remaining orphaned assets drain on subsequent orphanage ticks. - The method keeps its managed session and does not commit internally, so one bounded batch per tick is used rather than an internal loop-with-commits. - The asset ids are materialised into the `IN` list (not a `LIMIT` subquery, which MySQL rejects); `asset_id` is the leading column of the `asset_state_store` primary key, so the filter is index-backed. ### Tests Adds `test_cleanup_orphaned_asset_state_store_batches_deletes` (first coverage for this method): with the per-tick cap patched to two, the first cleanup leaves one orphaned asset pending and the second drains it, while the active asset is never touched. Verified the test fails against the previous unbounded delete and passes with the bound in place. ##### Was generative AI tooling used to co-author this PR? - [X] Yes — Claude Code (Opus 4.8) Generated-by: Claude Code (Opus 4.8) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
