LakshmiSravyaVedantham opened a new pull request, #62482:
URL: https://github.com/apache/airflow/pull/62482

   ## Summary
   
   Closes #62025
   
   The `get_dag_runs` REST API endpoint experiences slow response times on 
large Airflow deployments.
   
   ## Root Cause
   
   `eager_load_dag_run_for_validation()` used `selectinload` on 
`DagRun.task_instances` and `DagRun.task_instances_histories` without 
restricting which columns were fetched. This caused SQLAlchemy to load **all** 
heavyweight `TaskInstance` columns — including `executor_config` (a pickled 
binary blob), `hostname`, `rendered_map_index`, `context_carrier`, and many 
others — for every task instance across every DAG run in the result page.
   
   In practice, only `dag_version_id` (the foreign key) is needed to traverse 
the `TaskInstance.dag_version` association proxy used by the 
`DagRun.dag_versions` property, which drives `DAGRunResponse.dag_versions` 
serialization.
   
   ## Fix
   
   Add `load_only(TaskInstance.dag_version_id)` and 
`load_only(TaskInstanceHistory.dag_version_id)` to the respective 
`selectinload` chains. SQLAlchemy will then issue a `SELECT` for task instances 
that fetches only the identity columns and `dag_version_id`, rather than the 
full row. The subsequent 
`.joinedload(TaskInstance.dag_version).joinedload(DagVersion.bundle)` chain 
continues to work correctly.
   
   ```python
   # Before
   selectinload(DagRun.task_instances)
       .joinedload(TaskInstance.dag_version)
       .joinedload(DagVersion.bundle),
   
   # After
   selectinload(DagRun.task_instances)
       .load_only(TaskInstance.dag_version_id)
       .joinedload(TaskInstance.dag_version)
       .joinedload(DagVersion.bundle),
   ```
   
   The same pattern is applied to `task_instances_histories`.
   
   ## Impact
   
   On deployments with many task instances per DAG run, this change 
significantly reduces:
   - The volume of data transferred from the database per request
   - Memory pressure from deserializing large pickled `executor_config` values
   - Response latency for `GET /dags/{dag_id}/dagRuns` and the batch `POST 
/dags/~/dagRuns/list` endpoints
   
   ## Testing
   
   - Existing tests in 
`tests/unit/api_fastapi/core_api/routes/public/test_dag_run.py` continue to pass
   - The `dag_versions` property continues to work correctly since 
`load_only()` restricts deferred columns, not the relationship traversal
   
   ## References
   
   - [SQLAlchemy `load_only` 
docs](https://docs.sqlalchemy.org/en/20/orm/queryguide/columns.html#column-deferral-using-load-only)
   - [SQLAlchemy loading 
strategies](https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html)
   - Fixes: #62025


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to