The GitHub Actions job "Tests" on airflow.git/fix/issue-62050 has succeeded.
Run started by GitHub user YoannAbriel (triggered by YoannAbriel).

Head commit for run:
7fd5ab00cb03d7ede901c6e09599e1ac763316b1 / Yoann Abriel <[email protected]>
fix: skip scheduling instead of bulk-failing when serialized DAG is transiently 
missing

When the scheduler cannot find a DAG in the serialized_dag table while checking
task concurrency limits, it previously set all SCHEDULED task instances for that
DAG to FAILED via a bulk UPDATE. This caused intermittent mass task failures on
multi-scheduler setups (e.g. MWAA) where the serialized DAG may be transiently
absent during a DAG file parse cycle.

Instead, treat the missing serialized DAG as a transient condition: log a 
warning,
add the dag_id to starved_dags so subsequent tasks for the same DAG are also
skipped, and let the scheduler retry on the next iteration.

The existing test 'test_queued_task_instances_fails_with_missing_dag' has been
updated to reflect the new expected behavior (tasks remain SCHEDULED). A new
regression test 'test_missing_serialized_dag_does_not_bulk_fail_tasks' 
explicitly
verifies no bulk-failure occurs.

Closes #62050

Report URL: https://github.com/apache/airflow/actions/runs/24182017934

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to