ephraimbuddy opened a new pull request, #60330:
URL: https://github.com/apache/airflow/pull/60330

   In HA, two scheduler processes can race to schedule the same TaskInstance. 
Previously DagRun.schedule_tis() updated rows by ti.id alone, so a scheduler 
could increment try_number and transition state even after another scheduler 
had already advanced the TI (e.g. to SCHEDULED/QUEUED), resulting in duplicate 
attempts being queued.
   
   This change makes scheduling idempotent under HA races by:
   - Guarding schedule_tis() DB updates to only apply when the TI is still in 
schedulable states (derived from SCHEDULEABLE_STATES, handling NULL explicitly).
   
   - Using a single CASE (next_try_number) so reschedules (UP_FOR_RESCHEDULE) 
do not start a new try, and applying this consistently to both normal 
scheduling and the EmptyOperator fast-path.
   
   Adds regression tests covering:
   - TI already queued by another scheduler.
   - EmptyOperator fast-path blocked when TI is already QUEUED/RUNNING.
   - UP_FOR_RESCHEDULE scheduling keeps try_number unchanged.
   - Only one “scheduler” update succeeds when competing.
   
   Closes: https://github.com/apache/airflow/issues/57618
   
   Note: The reproduction of this issue was based on unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to