ashb commented on code in PR #60330:
URL: https://github.com/apache/airflow/pull/60330#discussion_r2676649066


##########
airflow-core/src/airflow/models/dagrun.py:
##########
@@ -2063,25 +2063,30 @@ def schedule_tis(
                 empty_ti_ids.append(ti.id)
 
         count = 0
-
+        # Don't only check if the TI.id is in id_chunk
+        # but also check if the TI.state is in the schedulable states.
+        # Plus, a scheduled empty operator should not be scheduled again.
+        non_null_schedulable_states = tuple(s for s in SCHEDULEABLE_STATES if 
s is not None)
+        schedulable_state_clause = or_(
+            TI.state.is_(None),
+            TI.state.in_(non_null_schedulable_states),
+        )

Review Comment:
   We should do some benchmarking/checking of indexes on this query if you 
haven't already. TI state is almost certainly indexed, but we should check this 
with some EXPLAIN ANALYZE in a moderatly sized DB if we have one (something to 
the region of 1-2m TI rows?)
   
   Might also be worth checking if a `schedulable_state_clause = TI.state != 
TaskInstanceState.SCHEDULED` is enough and more-performant?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to