dstandish commented on issue #58919:
URL: https://github.com/apache/airflow/issues/58919#issuecomment-3602306230

   possible solutions
   
   1. use locking when running get-or-create-apdr 
   
   Somehow we would need to use locking so that the two writers would not both 
see no apdr record and thus both create one at the same time.  Most extreme 
(probably not good) would be to take a lock on the dag table.  But this could 
interfere with performance and other thnigs that try to lock the dag table.  
There are other ideas.  E.g. create an apdr mutex table (or perhaps a partition 
state table...?) that would have the grain of dag_id and partition_key (i.e. a 
uniqueness constraint on that) and then this could be locked during the 
get-or-create operation, ensuring sequentiality.  maybe the "apdr mutex table" 
is just a parent table with grain dag_id, partition_key and apdr keys to this.
   
   2. we could get rid of the apdr table and change the querying and logic to 
just sift through the pile of records in the key log table to figure out what 
runs need to be created etc.  The querying would be more expensive but you 
would not be subject to the race condition.
   
   3. still use locking somehow but avoid an external mutex table by somehow 
using locking within apdr.  e.g. perhaps implement some is_latest flag with a 
partial unique constraint, or something different
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to