1fanwang commented on PR #66820:
URL: https://github.com/apache/airflow/pull/66820#issuecomment-4688055760

   Closing this — I couldn't reproduce the root cause this PR assumes.
   
   I set up a live multi-scheduler repro on `main` rather than reasoning about 
it from synthesis, and the `dag_run` / `task_instance` deadlock this targets 
didn't surface. Both fetches in the two phases already take their rows with 
`FOR UPDATE SKIP LOCKED`, so the schedulers don't contend on the same dag-run 
rows, and the before/after SQL is identical with vs without `expunge_all()`.
   
   <details><summary>Live repro</summary>
   
   ```
   # apache/airflow main, fresh MySQL 8.0 metadata DB (real `airflow db 
migrate`)
   # 24 catchup DAGs (1-min schedule, fan-out/fan-in tasks), dags unpaused
   # 2–3 real `airflow scheduler` processes against the same DB, concurrently
   
   # after ~3 min under load: ~2,000 dag runs + ~10,000 task instances created
   # → the dag_run / task_instance deadlock this PR targets: did not occur
   # → before/after SQL (with vs without expunge_all between phases): identical
   # → no functional difference: same dag-run/TI progression, no duplicates
   ```
   </details>
   
   Rather than keep a fix up that I can't back with a real repro, I'd rather 
take a step back and benchmark this properly before pursuing it again. 
@ephraimbuddy — you were right to push for a real log on this; thanks for that. 
@potiuk thanks for the triage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to