tirkarthi opened a new issue, #57418:
URL: https://github.com/apache/airflow/issues/57418

   ### Apache Airflow version
   
   main (development)
   
   ### If "Other Airflow 2/3 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   As noted in 
https://github.com/apache/airflow/issues/56635#issuecomment-3456040061, We 
enabled the sqlalchemy query logging we saw a lot of n+1 type of queries in our 
testing environment. It seems that DagRun has relationship to task_instances, 
task_instance_histories, deadlines and dag_run_note. It seems using 
model_validate in pydantic loads all attributes and this causes each attribute 
access to be a separate query though it's not used anywhere in the UI.
   
   Since these are per dag_run on a page with 15 dags and 2135 dagruns with 
each dag having 3 dagruns this causes 45 (15 * 3) queries per table 
task_instances, task_instance_histories, deadlines and dag_run_note resulting 
in around 200+ queries for the page load. sqlalchemy has an option of using 
noload to not which was deprecated recently but still useful here since the 
fields are not used. An improved patch would be to use DAGRunLightResponse 
which will also result in reduction of fields from database and reduction in 
response payload.
   
   https://github.com/pydantic/pydantic/issues/8192
   https://github.com/sqlalchemy/sqlalchemy/discussions/10120
   
https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html#sqlalchemy.orm.noload
   
   ### What you think should happen instead?
   
   The fields not required should not be loaded at all in the query.
   
   ### How to reproduce
   
   1. Enable sqlalchemy query logging using `[sqlalchemy]` in 
`sql_alchemy_engine_args = {"echo": true}`
   2.  Load dags list page with each dag having 10-15 dagruns.
   3. Notice n+1 queries in the api-server logs.
   
   ```
   [2025-10-28T11:24:03.322253Z] {base.py:1577} INFO - [cached since 13.18s 
ago] {'param_1': 'dag_110', 'param_2': 'scheduled__2025-01-01T00:00:00+00:00'}
   [2025-10-28T11:24:03.324796Z] {base.py:1577} INFO - SELECT 
task_instance.rendered_map_index AS task_instance_rendered_map_index, 
task_instance.task_display_name AS task_instance_task_display_name, 
task_instance.id AS task_instance_id, task_instance.task_id AS 
task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, 
task_instance.run_id AS task_instance_run_id, task_instance.map_index AS 
task_instance_map_index, task_instance.start_date AS task_instance_start_date, 
task_instance.end_date AS task_instance_end_date, task_instance.duration AS 
task_instance_duration, task_instance.state AS task_instance_state, 
task_instance.try_number AS task_instance_try_number, task_instance.max_tries 
AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, 
task_instance.unixname AS task_instance_unixname, task_instance.pool AS 
task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, 
task_instance.queue AS task_instance_queue, task_instance.priority_w
 eight AS task_instance_priority_weight, task_instance.operator AS 
task_instance_operator, task_instance.custom_operator_name AS 
task_instance_custom_operator_name, task_instance.queued_dttm AS 
task_instance_queued_dttm, task_instance.scheduled_dttm AS 
task_instance_scheduled_dttm, task_instance.queued_by_job_id AS 
task_instance_queued_by_job_id, task_instance.last_heartbeat_at AS 
task_instance_last_heartbeat_at, task_instance.pid AS task_instance_pid, 
task_instance.executor AS task_instance_executor, task_instance.executor_config 
AS task_instance_executor_config, task_instance.updated_at AS 
task_instance_updated_at, task_instance.context_carrier AS 
task_instance_context_carrier, task_instance.span_status AS 
task_instance_span_status, task_instance.external_executor_id AS 
task_instance_external_executor_id, task_instance.trigger_id AS 
task_instance_trigger_id, task_instance.trigger_timeout AS 
task_instance_trigger_timeout, task_instance.next_method AS 
task_instance_next_method, task_
 instance.next_kwargs AS task_instance_next_kwargs, 
task_instance.dag_version_id AS task_instance_dag_version_id, dag_run_1.state 
AS dag_run_1_state, dag_run_1.id AS dag_run_1_id, dag_run_1.dag_id AS 
dag_run_1_dag_id, dag_run_1.queued_at AS dag_run_1_queued_at, 
dag_run_1.logical_date AS dag_run_1_logical_date, dag_run_1.start_date AS 
dag_run_1_start_date, dag_run_1.end_date AS dag_run_1_end_date, 
dag_run_1.run_id AS dag_run_1_run_id, dag_run_1.creating_job_id AS 
dag_run_1_creating_job_id, dag_run_1.run_type AS dag_run_1_run_type, 
dag_run_1.triggered_by AS dag_run_1_triggered_by, 
dag_run_1.triggering_user_name AS dag_run_1_triggering_user_name, 
dag_run_1.conf AS dag_run_1_conf, dag_run_1.data_interval_start AS 
dag_run_1_data_interval_start, dag_run_1.data_interval_end AS 
dag_run_1_data_interval_end, dag_run_1.run_after AS dag_run_1_run_after, 
dag_run_1.last_scheduling_decision AS dag_run_1_last_scheduling_decision, 
dag_run_1.log_template_id AS dag_run_1_log_template_id, dag_run_1.upda
 ted_at AS dag_run_1_updated_at, dag_run_1.clear_number AS 
dag_run_1_clear_number, dag_run_1.backfill_id AS dag_run_1_backfill_id, 
dag_run_1.bundle_version AS dag_run_1_bundle_version, 
dag_run_1.scheduled_by_job_id AS dag_run_1_scheduled_by_job_id, 
dag_run_1.context_carrier AS dag_run_1_context_carrier, dag_run_1.span_status 
AS dag_run_1_span_status, dag_run_1.created_dag_version_id AS 
dag_run_1_created_dag_version_id 
   FROM task_instance INNER JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = 
task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id 
   WHERE %(param_1)s = task_instance.dag_id AND %(param_2)s = 
task_instance.run_id
   [2025-10-28T11:24:03.325142Z] {base.py:1577} INFO - [cached since 13.18s 
ago] {'param_1': 'dag_110', 'param_2': 'scheduled__2025-01-01T00:00:00+00:00'}
   [2025-10-28T11:24:03.329438Z] {base.py:1577} INFO - SELECT 
dag_run_note.user_id AS dag_run_note_user_id, dag_run_note.dag_run_id AS 
dag_run_note_dag_run_id, dag_run_note.content AS dag_run_note_content, 
dag_run_note.created_at AS dag_run_note_created_at, dag_run_note.updated_at AS 
dag_run_note_updated_at 
   FROM dag_run_note 
   WHERE dag_run_note.dag_run_id = %(pk_1)s
   [2025-10-28T11:24:03.329716Z] {base.py:1577} INFO - [cached since 13.2s ago] 
{'pk_1': 421}
   [2025-10-28T11:24:03.331467Z] {base.py:1577} INFO - SELECT 
task_instance_history.task_instance_id AS 
task_instance_history_task_instance_id, task_instance_history.task_id AS 
task_instance_history_task_id, task_instance_history.dag_id AS 
task_instance_history_dag_id, task_instance_history.run_id AS 
task_instance_history_run_id, task_instance_history.map_index AS 
task_instance_history_map_index, task_instance_history.try_number AS 
task_instance_history_try_number, task_instance_history.start_date AS 
task_instance_history_start_date, task_instance_history.end_date AS 
task_instance_history_end_date, task_instance_history.duration AS 
task_instance_history_duration, task_instance_history.state AS 
task_instance_history_state, task_instance_history.max_tries AS 
task_instance_history_max_tries, task_instance_history.hostname AS 
task_instance_history_hostname, task_instance_history.unixname AS 
task_instance_history_unixname, task_instance_history.pool AS 
task_instance_history_pool, task_ins
 tance_history.pool_slots AS task_instance_history_pool_slots, 
task_instance_history.queue AS task_instance_history_queue, 
task_instance_history.priority_weight AS task_instance_history_priority_weight, 
task_instance_history.operator AS task_instance_history_operator, 
task_instance_history.custom_operator_name AS 
task_instance_history_custom_operator_name, task_instance_history.queued_dttm 
AS task_instance_history_queued_dttm, task_instance_history.scheduled_dttm AS 
task_instance_history_scheduled_dttm, task_instance_history.queued_by_job_id AS 
task_instance_history_queued_by_job_id, task_instance_history.pid AS 
task_instance_history_pid, task_instance_history.executor AS 
task_instance_history_executor, task_instance_history.executor_config AS 
task_instance_history_executor_config, task_instance_history.updated_at AS 
task_instance_history_updated_at, task_instance_history.rendered_map_index AS 
task_instance_history_rendered_map_index, task_instance_history.context_carrier 
AS task_ins
 tance_history_context_carrier, task_instance_history.span_status AS 
task_instance_history_span_status, task_instance_history.external_executor_id 
AS task_instance_history_external_executor_id, task_instance_history.trigger_id 
AS task_instance_history_trigger_id, task_instance_history.trigger_timeout AS 
task_instance_history_trigger_timeout, task_instance_history.next_method AS 
task_instance_history_next_method, task_instance_history.next_kwargs AS 
task_instance_history_next_kwargs, task_instance_history.task_display_name AS 
task_instance_history_task_display_name, task_instance_history.dag_version_id 
AS task_instance_history_dag_version_id 
   FROM task_instance_history 
   WHERE %(param_1)s = task_instance_history.dag_id AND %(param_2)s = 
task_instance_history.run_id ORDER BY task_instance_history.dag_version_id
   [2025-10-28T11:24:03.331751Z] {base.py:1577} INFO - [cached since 13.19s 
ago] {'param_1': 'dag_101', 'param_2': 'scheduled__2025-01-01T00:00:00+00:00'}
   [2025-10-28T11:24:03.333946Z] {base.py:1577} INFO - SELECT 
task_instance.rendered_map_index AS task_instance_rendered_map_index, 
task_instance.task_display_name AS task_instance_task_display_name, 
task_instance.id AS task_instance_id, task_instance.task_id AS 
task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, 
task_instance.run_id AS task_instance_run_id, task_instance.map_index AS 
task_instance_map_index, task_instance.start_date AS task_instance_start_date, 
task_instance.end_date AS task_instance_end_date, task_instance.duration AS 
task_instance_duration, task_instance.state AS task_instance_state, 
task_instance.try_number AS task_instance_try_number, task_instance.max_tries 
AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, 
task_instance.unixname AS task_instance_unixname, task_instance.pool AS 
task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, 
task_instance.queue AS task_instance_queue, task_instance.priority_w
 eight AS task_instance_priority_weight, task_instance.operator AS 
task_instance_operator, task_instance.custom_operator_name AS 
task_instance_custom_operator_name, task_instance.queued_dttm AS 
task_instance_queued_dttm, task_instance.scheduled_dttm AS 
task_instance_scheduled_dttm, task_instance.queued_by_job_id AS 
task_instance_queued_by_job_id, task_instance.last_heartbeat_at AS 
task_instance_last_heartbeat_at, task_instance.pid AS task_instance_pid, 
task_instance.executor AS task_instance_executor, task_instance.executor_config 
AS task_instance_executor_config, task_instance.updated_at AS 
task_instance_updated_at, task_instance.context_carrier AS 
task_instance_context_carrier, task_instance.span_status AS 
task_instance_span_status, task_instance.external_executor_id AS 
task_instance_external_executor_id, task_instance.trigger_id AS 
task_instance_trigger_id, task_instance.trigger_timeout AS 
task_instance_trigger_timeout, task_instance.next_method AS 
task_instance_next_method, task_
 instance.next_kwargs AS task_instance_next_kwargs, 
task_instance.dag_version_id AS task_instance_dag_version_id, dag_run_1.state 
AS dag_run_1_state, dag_run_1.id AS dag_run_1_id, dag_run_1.dag_id AS 
dag_run_1_dag_id, dag_run_1.queued_at AS dag_run_1_queued_at, 
dag_run_1.logical_date AS dag_run_1_logical_date, dag_run_1.start_date AS 
dag_run_1_start_date, dag_run_1.end_date AS dag_run_1_end_date, 
dag_run_1.run_id AS dag_run_1_run_id, dag_run_1.creating_job_id AS 
dag_run_1_creating_job_id, dag_run_1.run_type AS dag_run_1_run_type, 
dag_run_1.triggered_by AS dag_run_1_triggered_by, 
dag_run_1.triggering_user_name AS dag_run_1_triggering_user_name, 
dag_run_1.conf AS dag_run_1_conf, dag_run_1.data_interval_start AS 
dag_run_1_data_interval_start, dag_run_1.data_interval_end AS 
dag_run_1_data_interval_end, dag_run_1.run_after AS dag_run_1_run_after, 
dag_run_1.last_scheduling_decision AS dag_run_1_last_scheduling_decision, 
dag_run_1.log_template_id AS dag_run_1_log_template_id, dag_run_1.upda
 ted_at AS dag_run_1_updated_at, dag_run_1.clear_number AS 
dag_run_1_clear_number, dag_run_1.backfill_id AS dag_run_1_backfill_id, 
dag_run_1.bundle_version AS dag_run_1_bundle_version, 
dag_run_1.scheduled_by_job_id AS dag_run_1_scheduled_by_job_id, 
dag_run_1.context_carrier AS dag_run_1_context_carrier, dag_run_1.span_status 
AS dag_run_1_span_status, dag_run_1.created_dag_version_id AS 
dag_run_1_created_dag_version_id 
   FROM task_instance INNER JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = 
task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id 
   WHERE %(param_1)s = task_instance.dag_id AND %(param_2)s = 
task_instance.run_id
   [2025-10-28T11:24:03.334290Z] {base.py:1577} INFO - [cached since 13.18s 
ago] {'param_1': 'dag_101', 'param_2': 'scheduled__2025-01-01T00:00:00+00:00'}
   [2025-10-28T11:24:03.337545Z] {base.py:1577} INFO - SELECT 
dag_run_note.user_id AS dag_run_note_user_id, dag_run_note.dag_run_id AS 
dag_run_note_dag_run_id, dag_run_note.content AS dag_run_note_content, 
dag_run_note.created_at AS dag_run_note_created_at, dag_run_note.updated_at AS 
dag_run_note_updated_at 
   FROM dag_run_note 
   WHERE dag_run_note.dag_run_id = %(pk_1)s
   
   
   
   [2025-10-28T11:24:03.413314Z] {base.py:1577} INFO - [cached since 12.74s 
ago] {'name_1': 'dags-folder-1'}
   [2025-10-28T11:24:03.414408Z] {base.py:2630} INFO - COMMIT
   [2025-10-28T11:24:03.415589Z] {base.py:2624} INFO - BEGIN (implicit)
   [2025-10-28T11:24:03.415903Z] {base.py:1577} INFO - SELECT dag_bundle.name, 
dag_bundle.active, dag_bundle.version, dag_bundle.last_refreshed, 
dag_bundle.signed_url_template, dag_bundle.template_params 
   FROM dag_bundle 
   WHERE dag_bundle.name = %(name_1)s
   [2025-10-28T11:24:03.416012Z] {base.py:1577} INFO - [cached since 12.74s 
ago] {'name_1': 'dags-folder-1'}
   [2025-10-28T11:24:03.417021Z] {base.py:2630} INFO - COMMIT
   [2025-10-28T11:24:03.418249Z] {base.py:2624} INFO - BEGIN (implicit)
   [2025-10-28T11:24:03.418574Z] {base.py:1577} INFO - SELECT dag_bundle.name, 
dag_bundle.active, dag_bundle.version, dag_bundle.last_refreshed, 
dag_bundle.signed_url_template, dag_bundle.template_params 
   FROM dag_bundle 
   WHERE dag_bundle.name = %(name_1)s
   [2025-10-28T11:24:03.418695Z] {base.py:1577} INFO - [cached since 12.74s 
ago] {'name_1': 'dags-folder-1'}
   [2025-10-28T11:24:03.419683Z] {base.py:2630} INFO - COMMIT
   [2025-10-28T11:24:03.421075Z] {base.py:2624} INFO - BEGIN (implicit)
   [2025-10-28T11:24:03.421401Z] {base.py:1577} INFO - SELECT dag_bundle.name, 
dag_bundle.active, dag_bundle.version, dag_bundle.last_refreshed, 
dag_bundle.signed_url_template, dag_bundle.template_params 
   FROM dag_bundle 
   WHERE dag_bundle.name = %(name_1)s
   ```
   
   ### Operating System
   
   Ubuntu 20.04
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to