ephraimbuddy opened a new pull request, #60330: URL: https://github.com/apache/airflow/pull/60330
In HA, two scheduler processes can race to schedule the same TaskInstance. Previously DagRun.schedule_tis() updated rows by ti.id alone, so a scheduler could increment try_number and transition state even after another scheduler had already advanced the TI (e.g. to SCHEDULED/QUEUED), resulting in duplicate attempts being queued. This change makes scheduling idempotent under HA races by: - Guarding schedule_tis() DB updates to only apply when the TI is still in schedulable states (derived from SCHEDULEABLE_STATES, handling NULL explicitly). - Using a single CASE (next_try_number) so reschedules (UP_FOR_RESCHEDULE) do not start a new try, and applying this consistently to both normal scheduling and the EmptyOperator fast-path. Adds regression tests covering: - TI already queued by another scheduler. - EmptyOperator fast-path blocked when TI is already QUEUED/RUNNING. - UP_FOR_RESCHEDULE scheduling keeps try_number unchanged. - Only one “scheduler” update succeeds when competing. Closes: https://github.com/apache/airflow/issues/57618 Note: The reproduction of this issue was based on unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
