1fanwang opened a new pull request, #66820:
URL: https://github.com/apache/airflow/pull/66820

   ### Problem
   
   `SchedulerJobRunner._do_scheduling()` runs in two phases against the same 
session. Phase 1 calls `_start_queued_dagruns()` and `guard.commit()`; phase 2 
fetches running dag runs and calls `_schedule_all_dag_runs()`. After phase 1 
commits, the `DagRun` objects that phase 1 loaded are still in the session's 
identity map. When phase 2's `_schedule_all_dag_runs()` triggers a flush or 
merge, those leftover instances can be re-dirtied and end up in the final 
`guard.commit()`.
   
   Under HA scheduler deployments with several active replicas processing 
different dag runs, that means each replica's final commit touches not only the 
rows it intends to update but also a tail of stale rows in an order driven by 
phase 1, which is not the order other replicas are taking for their own work. 
The result is A-B / B-A deadlocks on the `(dag_run, task_instance)` lock pair — 
`1213 "Deadlock found when trying to get lock"` on MySQL, `deadlock detected` 
on PostgreSQL — and the loop slows down under contention.
   
   ### Fix
   
   Add a single `session.expunge_all()` immediately after the phase 1 
`guard.commit()` and before phase 2's 
`DagRun.get_running_dag_runs_to_examine(...)`. Phase 2 then reloads its working 
set fresh, and the final commit touches only the rows phase 2 intentionally 
pulled in. The outer `session.expunge_all()` already in place later in 
`_do_scheduling()` does the same thing globally; this one closes the gap 
between phases.
   
   ### Tests
   
   Added a unit test that patches `_start_queued_dagruns` to seed the identity 
map with a known dag run, patches `DagRun.get_running_dag_runs_to_examine` to 
capture the identity map keys at the start of phase 2, and asserts the captured 
set is empty.
   
   Closes #66817
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to