dstandish commented on issue #58919: URL: https://github.com/apache/airflow/issues/58919#issuecomment-3602306230
possible solutions 1. use locking when running get-or-create-apdr Somehow we would need to use locking so that the two writers would not both see no apdr record and thus both create one at the same time. Most extreme (probably not good) would be to take a lock on the dag table. But this could interfere with performance and other thnigs that try to lock the dag table. There are other ideas. E.g. create an apdr mutex table (or perhaps a partition state table...?) that would have the grain of dag_id and partition_key (i.e. a uniqueness constraint on that) and then this could be locked during the get-or-create operation, ensuring sequentiality. maybe the "apdr mutex table" is just a parent table with grain dag_id, partition_key and apdr keys to this. 2. we could get rid of the apdr table and change the querying and logic to just sift through the pile of records in the key log table to figure out what runs need to be created etc. The querying would be more expensive but you would not be subject to the race condition. 3. still use locking somehow but avoid an external mutex table by somehow using locking within apdr. e.g. perhaps implement some is_latest flag with a partial unique constraint, or something different -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
