msumit commented on PR #60166:
URL: https://github.com/apache/airflow/pull/60166#issuecomment-3717328792

   @wjddn279 I think the main reason is that we are running multiple 
dag-processors, which is also proved by the following LLM analysis on the 
problematic code block. 
   
   > The deadlock you're experiencing in MySQL is caused by the use of a 
subquery within a DELETE statement:
   
delete(model).where(model.dag_id.in_(select(DagModel.dag_id).where(DagModel.is_stale)))
   In MySQL, DELETE ... WHERE ... IN (SELECT ...) can lead to deadlocks because 
the database may acquire shared locks on the subquery table (dag) while 
attempting to get exclusive locks on the target table 
(dag_schedule_asset_alias_reference, etc.). When multiple DAG processor 
instances or other concurrent transactions are performing similar operations, 
they can acquire these locks in different orders, resulting in a deadlock.
   
   
   > The deadlock is happening because this DELETE acquires a shared lock on 
the matching dag rows (because of the subquery) while also taking row locks on 
each dag_schedule_* table; meanwhile the DAG processor is updating the same DAG 
row (is_stale → 0) and inserting/updating the alias reference rows. The lock 
order ends up reversed across the two transactions: the cleanup code holds the 
dag row lock and waits on alias rows, while the processor holds the alias row 
lock and waits on the same dag row, producing the 1213 deadlock.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to