Arunodoy18 opened a new pull request, #60136: URL: https://github.com/apache/airflow/pull/60136
Here's the PR description for the pull request that was just raised: Fix scheduler slowdown with large numbers of completed dag runs Description This PR fixes a critical performance issue where the scheduler experiences significant degradation when there are many completed dag runs and task instances in the database. Fixes #54283 Problem The scheduler was slowing down dramatically over time: With ~100k completed dag runs and ~3M task instances, each scheduler loop took ~15 seconds instead of ~1 second Performance only returned to normal after manually running [airflow db clean](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) to trim old data This issue did not exist in Airflow 2.x Root Cause The issue was in [DagRun.get_running_dag_runs_to_examine()](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) which had: .options(joinedload(cls.task_instances)) This was eagerly loading ALL task instances for ALL running dag runs in every scheduler loop, creating massive database joins with millions of rows even though: Only unfinished task instances were actually needed The task instances were only used in one specific code path ([_verify_integrity_if_dag_changed](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) Changes Removed eager loading in [dagrun.py](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Removed [.options(joinedload(cls.task_instances))](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) from the query in [get_running_dag_runs_to_examine()](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) Optimized task instance loading in [scheduler_job_runner.py](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Modified [_verify_integrity_if_dag_changed()](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) to explicitly query only unfinished task instances when needed This avoids lazy-loading all task instances and loads only what's required Impact ✅ Scheduler loop time reduced from ~15s back to ~1s with large numbers of completed dag runs ✅ Significantly reduced database query size and complexity ✅ No breaking changes - maintains backward compatibility ✅ No longer requires frequent [airflow db clean](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) operations for performance Testing Existing unit tests pass Code syntax validated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
