aksdevs opened a new pull request, #62152:
URL: https://github.com/apache/airflow/pull/62152

   Two critical optimizations to reduce database queries and enable query 
optimizer:
   
   1. Combine DagRun queries from 2 to 1
      - Before: 2 separate queries (one for run_type, one for state) with 
identical WHERE clauses
      - After: 1 query grouped by (run_type, state), pivoted in Python to 
separate counts
      - Impact: Eliminates 1 DB round trip from 4 to 3 total queries
   
   2. Add explicit TaskInstance.dag_id filter before JOIN
      - Before: TI query only filtered dag_id indirectly via JOIN with DagRun
      - After: Added explicit TaskInstance.dag_id.in_(permitted_dag_ids) filter 
before JOIN
      - Impact: Enables database optimizer to use ti_dag_run index (dag_id, 
run_id) before the join
      - Critical: With millions of TaskInstances, this dramatically reduces 
query execution time
   
   These optimizations specifically address the slow retrieval of historical 
metrics on large installations where TaskInstance counts are in the millions.
   
   Updated test assertion from 4 to 3 queries to reflect the optimization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to