aksdevs opened a new pull request, #62152:
URL: https://github.com/apache/airflow/pull/62152
Two critical optimizations to reduce database queries and enable query
optimizer:
1. Combine DagRun queries from 2 to 1
- Before: 2 separate queries (one for run_type, one for state) with
identical WHERE clauses
- After: 1 query grouped by (run_type, state), pivoted in Python to
separate counts
- Impact: Eliminates 1 DB round trip from 4 to 3 total queries
2. Add explicit TaskInstance.dag_id filter before JOIN
- Before: TI query only filtered dag_id indirectly via JOIN with DagRun
- After: Added explicit TaskInstance.dag_id.in_(permitted_dag_ids) filter
before JOIN
- Impact: Enables database optimizer to use ti_dag_run index (dag_id,
run_id) before the join
- Critical: With millions of TaskInstances, this dramatically reduces
query execution time
These optimizations specifically address the slow retrieval of historical
metrics on large installations where TaskInstance counts are in the millions.
Updated test assertion from 4 to 3 queries to reflect the optimization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]