potiuk commented on pull request #15714: URL: https://github.com/apache/airflow/pull/15714#issuecomment-841816345
I think what also matters is how SQLAlchemy works and how we are using it.. I am not really concerned too much about changing the default to READ_COMMITTED for all transactions, because: a) we have very short transactions usuallu and we are usually working on the same set of rows/tables as we retrieve in the first query in the transaction b) SQL Alchemy will retrieve the rows we are working on and store them as objects in memory and only when we flush them /commit transaction SQL alchemy will merge the change back. c) Scheduler works (in 2.0) in a small "tight" loops. Basically, it will retrieve N records (say first 100 matching the crirteria), lockng them and then only that thread of that scheduler will perform any changes to those rows and related data - merging them back). Then it commits and goes back and retrieves the next 100 matching rows. So the contention and parallel access to same rows is not really possible (under normal circumstances). Also the scheduler (which is the important one) uses indeed SKIP LOCKED (but I believe locking the same Gaps by different schedulers might cause the deadlocks in some scenarios even if SKIP LOCKED is used). @ashb -> I might not have the whole picture so maybe you can comment here. BTW. I think it might be an interesting topic of the talk of yours at the Summit :) https://twitter.com/AshBerlin/status/1393861492282429443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
