The GitHub Actions job "Tests (AMD)" on airflow.git/expunge-between-phases has 
failed.
Run started by GitHub user 1fanwang (triggered by 1fanwang).

Head commit for run:
0a40cf4e294a541b5c3eb5fffdefd52c8efd2e0b / 1fanwang <[email protected]>
fix(scheduler): clear identity map between _do_scheduling phases to avoid 
cross-replica deadlocks

After phase 1's commit, DagRun objects loaded by `_start_queued_dagruns`
remain in the session's identity map. Phase 2's flush / merge (notably
`session.merge(...)` with the same primary key) can then re-dirty those
instances and include them in the final commit in a row-lock order that
differs from what other scheduler replicas are taking, producing A-B / B-A
deadlocks on `dag_run` and `task_instance` under HA scheduler deployments.

Add `session.expunge_all()` between the phase 1 commit and the phase 2
query so phase 2 reloads its working set fresh. `expire_all` is not enough
on its own: expired entries are still subject to `merge`-side re-dirtying.
The outer `session.expunge_all()` already in place later in
`_do_scheduling` does the same thing globally; this one closes the window
between phases.

`test_dagrun_callbacks_are_called` / `test_dagrun_plugins_are_notified`
previously relied on the local `dr` / `ti` ORM references being identical
to the instances the scheduler mutated. With the expunge between phases,
phase 2 reloads fresh objects, so the assertions re-query `dr` and `ti`
from the database to capture the post-scheduling state.

Closes #66817

Signed-off-by: 1fanwang <[email protected]>

Report URL: https://github.com/apache/airflow/actions/runs/25784724861

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to