potiuk edited a comment on pull request #20894:
URL: https://github.com/apache/airflow/pull/20894#issuecomment-1016322047


   I am rather sceptical if it changes anything (unless I missed something).
   
   What thi change effectively does it would attempt to run:
   
   ```
   UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = 
%(dag_id_1)s
   AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN () FOR 
UPDATE;'
   ```
   
   Which I think is not happening anyway because it makes no sense. the FOR 
UPDATE clause only makes sense in select queries.  Row lock happens always when 
your UPDATE query, so I think sqlalchemy will simply silently ignore the 
whith_for_updae() clause. 
   
   What I think we really need to fix for this query is fix the OTHER query 
that causes it (but we need the logs from server to find out what the other 
query was). From how I understand here this piece of code should be protected 
by earlier locking of "DAG_RUN" row and there is another query that locks 
TASK_INSTANCE rows for that DAG_RUN - and it should not.
   
   There are two reasons it could happen:
   
   1) The other query does not lock the DAG_RUN row "deliberately" 
   2) We have a commit somewhere that releases the DAG_RUN lock
   
   I suspected initially that it's mini-scheduler, but I think it must be 
something else (mini-scheduler correctly  locks the DAG_RUN and I could not 
find any place where it would release it.
   
   One other reason is that it could be an API call or UI action that updates 
the task_instances.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to