Arunodoy18 opened a new pull request, #60136:
URL: https://github.com/apache/airflow/pull/60136

   Here's the PR description for the pull request that was just raised:
   
   Fix scheduler slowdown with large numbers of completed dag runs
   Description
   This PR fixes a critical performance issue where the scheduler experiences 
significant degradation when there are many completed dag runs and task 
instances in the database.
   
   Fixes #54283
   
   Problem
   The scheduler was slowing down dramatically over time:
   
   With ~100k completed dag runs and ~3M task instances, each scheduler loop 
took ~15 seconds instead of ~1 second
   Performance only returned to normal after manually running [airflow db 
clean](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
 to trim old data
   This issue did not exist in Airflow 2.x
   Root Cause
   The issue was in 
[DagRun.get_running_dag_runs_to_examine()](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
 which had:
   .options(joinedload(cls.task_instances))
   This was eagerly loading ALL task instances for ALL running dag runs in 
every scheduler loop, creating massive database joins with millions of rows 
even though:
   
   Only unfinished task instances were actually needed
   The task instances were only used in one specific code path 
([_verify_integrity_if_dag_changed](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html))
   Changes
   Removed eager loading in 
[dagrun.py](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html):
   
   Removed 
[.options(joinedload(cls.task_instances))](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
 from the query in 
[get_running_dag_runs_to_examine()](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
   Optimized task instance loading in 
[scheduler_job_runner.py](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html):
   
   Modified 
[_verify_integrity_if_dag_changed()](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
 to explicitly query only unfinished task instances when needed
   This avoids lazy-loading all task instances and loads only what's required
   Impact
   ✅ Scheduler loop time reduced from ~15s back to ~1s with large numbers of 
completed dag runs
   ✅ Significantly reduced database query size and complexity
   ✅ No breaking changes - maintains backward compatibility
   ✅ No longer requires frequent [airflow db 
clean](vscode-file://vscode-app/c:/Users/aruno/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
 operations for performance
   Testing
   Existing unit tests pass
   Code syntax validated
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to