msumit commented on PR #60166: URL: https://github.com/apache/airflow/pull/60166#issuecomment-3717328792
@wjddn279 I think the main reason is that we are running multiple dag-processors, which is also proved by the following LLM analysis on the problematic code block. > The deadlock you're experiencing in MySQL is caused by the use of a subquery within a DELETE statement: delete(model).where(model.dag_id.in_(select(DagModel.dag_id).where(DagModel.is_stale))) In MySQL, DELETE ... WHERE ... IN (SELECT ...) can lead to deadlocks because the database may acquire shared locks on the subquery table (dag) while attempting to get exclusive locks on the target table (dag_schedule_asset_alias_reference, etc.). When multiple DAG processor instances or other concurrent transactions are performing similar operations, they can acquire these locks in different orders, resulting in a deadlock. > The deadlock is happening because this DELETE acquires a shared lock on the matching dag rows (because of the subquery) while also taking row locks on each dag_schedule_* table; meanwhile the DAG processor is updating the same DAG row (is_stale → 0) and inserting/updating the alias reference rows. The lock order ends up reversed across the two transactions: the cleanup code holds the dag row lock and waits on alias rows, while the processor holds the alias row lock and waits on the same dag row, producing the 1213 deadlock. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
