hctrd opened a new issue, #56088:
URL: https://github.com/apache/airflow/issues/56088

   ### Apache Airflow version
   
   3.0.6
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   Seeing long running active sessions in the postgres database and causing 
high CPU utilization.
   
   Caused by this query: 
   
   ```
   SELECT serialized_dag.data, serialized_dag.data_compressed, 
serialized_dag.id, serialized_dag.dag_id, serialized_dag.created_at, 
serialized_dag.last_updated, serialized_dag.dag_hash, 
serialized_dag.dag_version_id
   FROM serialized_dag
   WHERE serialized_dag.dag_version_id IN (SELECT task_instance.dag_version_id
   FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND 
dag_run.run_id = task_instance.run_id
   WHERE dag_run.id IN (103279, 103260, 103240, 103221, 103202, 103183, 103164, 
103143, 103125, 103104, 103086, 103068, 103047, 103037, 103009, 102990, 102971, 
102952, 102933, 102916, 102895, 102874, 102856, 102837, 102817, 102800, 102777, 
102762, 102740, 102723, 102702, 102686, 102663, 102645, 102624, 102606, 102589, 
102573, 102553, 102533, 102511, 102495, 102476, 102458, 102436, 102415, 102398, 
102376, 102358, 102344, 102322, 102297, 102282, 102261, 102243, 102223, 102202, 
102181, 102162, 102146, 102124, 102106, 102090, 102068, 102047, 102031, 102012, 
101992, 101976, 101953, 101932, 101912, 101894, 101877, 101859, 101836, 101819, 
101797, 101780, 101764, 101743, 101722, 101704, 101685, 101668, 101646, 101634, 
101609, 101592, 101571, 101550, 101533, 101516, 101497, 101479, 101460, 101441, 
101423, 101406, 101386, 101365, 101346, 101326, 101308, 101286, 101266, 101246, 
101224, 101208, 101191, 101172, 101152, 101128, 101113, 101091, 101070, 101053, 
101033, 101014, 100993, 100975, 100956
 , 100938, 100916, 100896, 100877, 100860, 100841, 100821, 100805, 100783, 
100766, 100745, 100727, 100711, 100686, 100666, 100648, 100631, 100609, 100589, 
100574, 100551, 100535, 100515, 100493, 100473, 100458, 100436, 100417, 100400, 
100381, 100363, 100341, 100324, 100308, 100286, 100271, 100249, 100232, 100211, 
100200, 100195, 100182, 100167, 100162, 100152, 100140, 100130, 100122, 100109, 
100098, 100092, 100082, 100070, 100064, 100056, 100041, 100035, 100025, 100018, 
100010, 100001, 99991, 99978, 99970, 99960, 99954, 99942, 99933, 99925, 99915, 
99906, 99897, 99885, 99878, 99867, 99856, 99850, 99839, 99830, 99820, 99808, 
99799, 99791, 99783, 99775, 99764, 99751, 99744, 99736, 99727, 99712, 99707, 
99695, 99687, 99678, 99667, 99655, 99646, 99637, 99629, 99618, 99613, 99597, 
99589, 99578, 99572, 99560, 99551, 99542, 99533, 99525, 99515, 99506, 99496, 
99488, 99476, 99468, 99459, 99450, 99443, 99433, 99421, 99412, 99403, 99395, 
99386, 99377, 99361, 99352, 99345, 99334, 99324, 99319, 993
 05, 99294, 99289, 99277, 99270, 99256, 99249, 99237, 99229, 99223, 99210, 
99201, 99192, 99183, 99174, 99162, 99156, 99143, 99133, 99125, 99114, 99107, 
99097, 99085, 99077, 99066, 99058, 99050, 99040, 99032, 99021, 99008, 99001, 
98991, 98983, 98976, 98963, 98956, 98947, 98936, 98927, 98915, 98908, 98897, 
98890, 98879, 98871, 98859, 98851, 98843, 98832, 98825, 98812, 98805, 98795, 
98783, 98776, 98766, 98758, 98749, 98737, 98730, 98722, 98712, 98702, 98693, 
98682, 98672, 98662, 98650, 98644, 98633, 98622, 98612, 98602, 98593, 98584, 
98573, 98564, 98554, 98546, 98538, 98529, 98520, 98505, 98496, 98487, 98481, 
98468, 98459, 98450, 98440, 98430, 98421, 98413, 98402, 98392, 98384, 98375, 
98365, 98359, 98351, 98339, 98328, 98318, 98308, 98300, 98290, 98281, 98272) 
AND serialized_dag.id != '01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid);
   ```
   
   
   looping (i.e correlation) and and sequential scan due to serialized_dag.id 
present inside the inner sql block is causing the query to run for long..
   
   ### What you think should happen instead?
   
   Take out "AND serialized_dag.id != 
'01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid)” from inner SQL and add this 
condition to outer/main/parent sql. This will help to reduce looping (i.e avoid 
correlation) and avoid sequential scan and use index scan which helps to 
seed-up up the execution.
   
   We have seen significant improvement from 4+ hours execution time to 35ms 
with same result set as per our analysis/observation.
   
   
   ```
   SELECT serialized_dag.data, serialized_dag.data_compressed, 
serialized_dag.id, serialized_dag.dag_id, serialized_dag.created_at, 
serialized_dag.last_updated, serialized_dag.dag_hash, 
serialized_dag.dag_version_id
   FROM serialized_dag
   WHERE serialized_dag.dag_version_id IN (SELECT task_instance.dag_version_id
   FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND 
dag_run.run_id = task_instance.run_id
   WHERE dag_run.id IN (103279, 103260, 103240, 103221, 103202, 103183, 103164, 
103143, 103125, 103104, 103086, 103068, 103047, 103037, 103009, 102990, 102971, 
102952, 102933, 102916, 102895, 102874, 102856, 102837, 102817, 102800, 102777, 
102762, 102740, 102723, 102702, 102686, 102663, 102645, 102624, 102606, 102589, 
102573, 102553, 102533, 102511, 102495, 102476, 102458, 102436, 102415, 102398, 
102376, 102358, 102344, 102322, 102297, 102282, 102261, 102243, 102223, 102202, 
102181, 102162, 102146, 102124, 102106, 102090, 102068, 102047, 102031, 102012, 
101992, 101976, 101953, 101932, 101912, 101894, 101877, 101859, 101836, 101819, 
101797, 101780, 101764, 101743, 101722, 101704, 101685, 101668, 101646, 101634, 
101609, 101592, 101571, 101550, 101533, 101516, 101497, 101479, 101460, 101441, 
101423, 101406, 101386, 101365, 101346, 101326, 101308, 101286, 101266, 101246, 
101224, 101208, 101191, 101172, 101152, 101128, 101113, 101091, 101070, 101053, 
101033, 101014, 100993, 100975, 100956
 , 100938, 100916, 100896, 100877, 100860, 100841, 100821, 100805, 100783, 
100766, 100745, 100727, 100711, 100686, 100666, 100648, 100631, 100609, 100589, 
100574, 100551, 100535, 100515, 100493, 100473, 100458, 100436, 100417, 100400, 
100381, 100363, 100341, 100324, 100308, 100286, 100271, 100249, 100232, 100211, 
100200, 100195, 100182, 100167, 100162, 100152, 100140, 100130, 100122, 100109, 
100098, 100092, 100082, 100070, 100064, 100056, 100041, 100035, 100025, 100018, 
100010, 100001, 99991, 99978, 99970, 99960, 99954, 99942, 99933, 99925, 99915, 
99906, 99897, 99885, 99878, 99867, 99856, 99850, 99839, 99830, 99820, 99808, 
99799, 99791, 99783, 99775, 99764, 99751, 99744, 99736, 99727, 99712, 99707, 
99695, 99687, 99678, 99667, 99655, 99646, 99637, 99629, 99618, 99613, 99597, 
99589, 99578, 99572, 99560, 99551, 99542, 99533, 99525, 99515, 99506, 99496, 
99488, 99476, 99468, 99459, 99450, 99443, 99433, 99421, 99412, 99403, 99395, 
99386, 99377, 99361, 99352, 99345, 99334, 99324, 99319, 993
 05, 99294, 99289, 99277, 99270, 99256, 99249, 99237, 99229, 99223, 99210, 
99201, 99192, 99183, 99174, 99162, 99156, 99143, 99133, 99125, 99114, 99107, 
99097, 99085, 99077, 99066, 99058, 99050, 99040, 99032, 99021, 99008, 99001, 
98991, 98983, 98976, 98963, 98956, 98947, 98936, 98927, 98915, 98908, 98897, 
98890, 98879, 98871, 98859, 98851, 98843, 98832, 98825, 98812, 98805, 98795, 
98783, 98776, 98766, 98758, 98749, 98737, 98730, 98722, 98712, 98702, 98693, 
98682, 98672, 98662, 98650, 98644, 98633, 98622, 98612, 98602, 98593, 98584, 
98573, 98564, 98554, 98546, 98538, 98529, 98520, 98505, 98496, 98487, 98481, 
98468, 98459, 98450, 98440, 98430, 98421, 98413, 98402, 98392, 98384, 98375, 
98365, 98359, 98351, 98339, 98328, 98318, 98308, 98300, 98290, 98281, 98272)) 
AND serialized_dag.id != '01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid;
   ```
   
   ### How to reproduce
   
   Connect airflow to postgres and have it query multiple dag ids and observe 
postgres metrics.
   
   ### Operating System
   
   linux
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-common-sql==1.27.3
   apache-airflow-providers-fab==2.3.0
   apache-airflow-providers-mysql==6.3.2
   apache-airflow-providers-odbc==4.10.1
   apache-airflow-providers-postgres==6.2.1
   apache-airflow-providers-sendgrid==4.1.2
   apache-airflow-providers-standard==1.4.1
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Airflow: 3.0.3
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to