capytan opened a new pull request, #66762: URL: https://github.com/apache/airflow/pull/66762
related: #66734 After upgrading to 3.2.1 we hit periodic Postgres deadlocks on `task_instance` between two writers that lock the same rows in different orders: - scheduler's `_verify_integrity_if_dag_changed` issues `UPDATE task_instance ... WHERE dag_id=? AND run_id=? AND state IN (...)`, which the planner satisfies via the `(dag_id, run_id)` index - api-server's `ti_update_state` does `SELECT FOR UPDATE` on the same rows by primary key When the row sets overlap, PG aborts one of them with `deadlock detected`. Traceback and pg_stat_activity output are in #66734. This PR replaces the bulk UPDATE with a `SELECT id ORDER BY id LIMIT N FOR UPDATE SKIP LOCKED` + `UPDATE WHERE id IN (...)` batch loop, so the scheduler also locks in PK order. Same approach #65920 takes for `check_trigger_timeouts`; #65818 tracks this family of bugs. One thing that bit me on the way: the inner UPDATE only changes `dag_version_id`, not `state`, so a SELECT filtered solely on `state IN (State.unfinished)` keeps returning the same N ids on every iteration and the loop never terminates once the batch is full. The candidate SELECT also filters on `dag_version_id IS NULL OR != latest_dag_version.id` so each batch advances. The previous single bulk UPDATE never surfaced this; the batched form needs it. --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes — Claude Opus 4.7 (1M context) Generated-by: Claude Opus 4.7 (1M context) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
