The GitHub Actions job "Tests" on 
airflow.git/fix/scheduler-performance-with-completed-dagruns has failed.
Run started by GitHub user Arunodoy18 (triggered by Arunodoy18).

Head commit for run:
74678c06b5bde43010662fb2d94c11f8b2ee2521 / Arunodoy18 <[email protected]>
Fix scheduler slowdown with large numbers of completed dag runs

The scheduler was experiencing significant performance degradation when
there were many completed dag runs (100k+) and task instances (3M+).
Each scheduler loop was taking 15+ seconds instead of the normal ~1s.

Root cause:
- DagRun.get_running_dag_runs_to_examine() was eagerly loading ALL task
  instances for ALL running dag runs using joinedload()
- This created massive joins with millions of rows even though only
  unfinished task instances were actually needed
- The eager loading was only used in one code path
  (_verify_integrity_if_dag_changed) and only for unfinished TIs

Solution:
1. Remove the joinedload(cls.task_instances) from the query to avoid
   loading task instances upfront
2. Explicitly query only unfinished task instances when they're needed
   in _verify_integrity_if_dag_changed

This change significantly improves scheduler loop performance when there
are many completed dag runs and task instances, bringing the loop time
back to normal levels (~1s) without requiring frequent db clean operations.

Fixes #54283

Report URL: https://github.com/apache/airflow/actions/runs/20724141289

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to