github-actions[bot] opened a new pull request, #62996: URL: https://github.com/apache/airflow/pull/62996
* perf: use load_only() in eager_load_dag_run_for_validation to reduce data fetched The get_dag_runs API endpoint was slow on large deployments because eager_load_dag_run_for_validation() used selectinload on task_instances and task_instances_histories without restricting which columns were fetched. This caused SQLAlchemy to load all heavyweight columns (executor_config with pickled data, hostname, rendered fields, etc.) for every task instance across every DAG run in the result page — even though only dag_version_id is needed to traverse the association proxy to DagVersion. Add load_only(TaskInstance.dag_version_id) and load_only(TaskInstanceHistory.dag_version_id) to the selectinload chains so the SELECT for task instances fetches only the identity columns and the FK needed to resolve the dag_version relationship, significantly reducing the volume of data transferred from the database on busy deployments. Fixes #62025 * Fix static checks --------- (cherry picked from commit 13af96b80868ef91ca623d35afcd76003bfbda90) Co-authored-by: Lakshmi Sravya <[email protected]> Co-authored-by: pierrejeambrun <[email protected]> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
