hctrd opened a new issue, #56088: URL: https://github.com/apache/airflow/issues/56088
### Apache Airflow version 3.0.6 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? Seeing long running active sessions in the postgres database and causing high CPU utilization. Caused by this query: ``` SELECT serialized_dag.data, serialized_dag.data_compressed, serialized_dag.id, serialized_dag.dag_id, serialized_dag.created_at, serialized_dag.last_updated, serialized_dag.dag_hash, serialized_dag.dag_version_id FROM serialized_dag WHERE serialized_dag.dag_version_id IN (SELECT task_instance.dag_version_id FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id WHERE dag_run.id IN (103279, 103260, 103240, 103221, 103202, 103183, 103164, 103143, 103125, 103104, 103086, 103068, 103047, 103037, 103009, 102990, 102971, 102952, 102933, 102916, 102895, 102874, 102856, 102837, 102817, 102800, 102777, 102762, 102740, 102723, 102702, 102686, 102663, 102645, 102624, 102606, 102589, 102573, 102553, 102533, 102511, 102495, 102476, 102458, 102436, 102415, 102398, 102376, 102358, 102344, 102322, 102297, 102282, 102261, 102243, 102223, 102202, 102181, 102162, 102146, 102124, 102106, 102090, 102068, 102047, 102031, 102012, 101992, 101976, 101953, 101932, 101912, 101894, 101877, 101859, 101836, 101819, 101797, 101780, 101764, 101743, 101722, 101704, 101685, 101668, 101646, 101634, 101609, 101592, 101571, 101550, 101533, 101516, 101497, 101479, 101460, 101441, 101423, 101406, 101386, 101365, 101346, 101326, 101308, 101286, 101266, 101246, 101224, 101208, 101191, 101172, 101152, 101128, 101113, 101091, 101070, 101053, 101033, 101014, 100993, 100975, 100956 , 100938, 100916, 100896, 100877, 100860, 100841, 100821, 100805, 100783, 100766, 100745, 100727, 100711, 100686, 100666, 100648, 100631, 100609, 100589, 100574, 100551, 100535, 100515, 100493, 100473, 100458, 100436, 100417, 100400, 100381, 100363, 100341, 100324, 100308, 100286, 100271, 100249, 100232, 100211, 100200, 100195, 100182, 100167, 100162, 100152, 100140, 100130, 100122, 100109, 100098, 100092, 100082, 100070, 100064, 100056, 100041, 100035, 100025, 100018, 100010, 100001, 99991, 99978, 99970, 99960, 99954, 99942, 99933, 99925, 99915, 99906, 99897, 99885, 99878, 99867, 99856, 99850, 99839, 99830, 99820, 99808, 99799, 99791, 99783, 99775, 99764, 99751, 99744, 99736, 99727, 99712, 99707, 99695, 99687, 99678, 99667, 99655, 99646, 99637, 99629, 99618, 99613, 99597, 99589, 99578, 99572, 99560, 99551, 99542, 99533, 99525, 99515, 99506, 99496, 99488, 99476, 99468, 99459, 99450, 99443, 99433, 99421, 99412, 99403, 99395, 99386, 99377, 99361, 99352, 99345, 99334, 99324, 99319, 993 05, 99294, 99289, 99277, 99270, 99256, 99249, 99237, 99229, 99223, 99210, 99201, 99192, 99183, 99174, 99162, 99156, 99143, 99133, 99125, 99114, 99107, 99097, 99085, 99077, 99066, 99058, 99050, 99040, 99032, 99021, 99008, 99001, 98991, 98983, 98976, 98963, 98956, 98947, 98936, 98927, 98915, 98908, 98897, 98890, 98879, 98871, 98859, 98851, 98843, 98832, 98825, 98812, 98805, 98795, 98783, 98776, 98766, 98758, 98749, 98737, 98730, 98722, 98712, 98702, 98693, 98682, 98672, 98662, 98650, 98644, 98633, 98622, 98612, 98602, 98593, 98584, 98573, 98564, 98554, 98546, 98538, 98529, 98520, 98505, 98496, 98487, 98481, 98468, 98459, 98450, 98440, 98430, 98421, 98413, 98402, 98392, 98384, 98375, 98365, 98359, 98351, 98339, 98328, 98318, 98308, 98300, 98290, 98281, 98272) AND serialized_dag.id != '01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid); ``` looping (i.e correlation) and and sequential scan due to serialized_dag.id present inside the inner sql block is causing the query to run for long.. ### What you think should happen instead? Take out "AND serialized_dag.id != '01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid)” from inner SQL and add this condition to outer/main/parent sql. This will help to reduce looping (i.e avoid correlation) and avoid sequential scan and use index scan which helps to seed-up up the execution. We have seen significant improvement from 4+ hours execution time to 35ms with same result set as per our analysis/observation. ``` SELECT serialized_dag.data, serialized_dag.data_compressed, serialized_dag.id, serialized_dag.dag_id, serialized_dag.created_at, serialized_dag.last_updated, serialized_dag.dag_hash, serialized_dag.dag_version_id FROM serialized_dag WHERE serialized_dag.dag_version_id IN (SELECT task_instance.dag_version_id FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id WHERE dag_run.id IN (103279, 103260, 103240, 103221, 103202, 103183, 103164, 103143, 103125, 103104, 103086, 103068, 103047, 103037, 103009, 102990, 102971, 102952, 102933, 102916, 102895, 102874, 102856, 102837, 102817, 102800, 102777, 102762, 102740, 102723, 102702, 102686, 102663, 102645, 102624, 102606, 102589, 102573, 102553, 102533, 102511, 102495, 102476, 102458, 102436, 102415, 102398, 102376, 102358, 102344, 102322, 102297, 102282, 102261, 102243, 102223, 102202, 102181, 102162, 102146, 102124, 102106, 102090, 102068, 102047, 102031, 102012, 101992, 101976, 101953, 101932, 101912, 101894, 101877, 101859, 101836, 101819, 101797, 101780, 101764, 101743, 101722, 101704, 101685, 101668, 101646, 101634, 101609, 101592, 101571, 101550, 101533, 101516, 101497, 101479, 101460, 101441, 101423, 101406, 101386, 101365, 101346, 101326, 101308, 101286, 101266, 101246, 101224, 101208, 101191, 101172, 101152, 101128, 101113, 101091, 101070, 101053, 101033, 101014, 100993, 100975, 100956 , 100938, 100916, 100896, 100877, 100860, 100841, 100821, 100805, 100783, 100766, 100745, 100727, 100711, 100686, 100666, 100648, 100631, 100609, 100589, 100574, 100551, 100535, 100515, 100493, 100473, 100458, 100436, 100417, 100400, 100381, 100363, 100341, 100324, 100308, 100286, 100271, 100249, 100232, 100211, 100200, 100195, 100182, 100167, 100162, 100152, 100140, 100130, 100122, 100109, 100098, 100092, 100082, 100070, 100064, 100056, 100041, 100035, 100025, 100018, 100010, 100001, 99991, 99978, 99970, 99960, 99954, 99942, 99933, 99925, 99915, 99906, 99897, 99885, 99878, 99867, 99856, 99850, 99839, 99830, 99820, 99808, 99799, 99791, 99783, 99775, 99764, 99751, 99744, 99736, 99727, 99712, 99707, 99695, 99687, 99678, 99667, 99655, 99646, 99637, 99629, 99618, 99613, 99597, 99589, 99578, 99572, 99560, 99551, 99542, 99533, 99525, 99515, 99506, 99496, 99488, 99476, 99468, 99459, 99450, 99443, 99433, 99421, 99412, 99403, 99395, 99386, 99377, 99361, 99352, 99345, 99334, 99324, 99319, 993 05, 99294, 99289, 99277, 99270, 99256, 99249, 99237, 99229, 99223, 99210, 99201, 99192, 99183, 99174, 99162, 99156, 99143, 99133, 99125, 99114, 99107, 99097, 99085, 99077, 99066, 99058, 99050, 99040, 99032, 99021, 99008, 99001, 98991, 98983, 98976, 98963, 98956, 98947, 98936, 98927, 98915, 98908, 98897, 98890, 98879, 98871, 98859, 98851, 98843, 98832, 98825, 98812, 98805, 98795, 98783, 98776, 98766, 98758, 98749, 98737, 98730, 98722, 98712, 98702, 98693, 98682, 98672, 98662, 98650, 98644, 98633, 98622, 98612, 98602, 98593, 98584, 98573, 98564, 98554, 98546, 98538, 98529, 98520, 98505, 98496, 98487, 98481, 98468, 98459, 98450, 98440, 98430, 98421, 98413, 98402, 98392, 98384, 98375, 98365, 98359, 98351, 98339, 98328, 98318, 98308, 98300, 98290, 98281, 98272)) AND serialized_dag.id != '01997ae2-22b1-7959-8639-dcf8ab870d45'::uuid; ``` ### How to reproduce Connect airflow to postgres and have it query multiple dag ids and observe postgres metrics. ### Operating System linux ### Versions of Apache Airflow Providers apache-airflow-providers-common-sql==1.27.3 apache-airflow-providers-fab==2.3.0 apache-airflow-providers-mysql==6.3.2 apache-airflow-providers-odbc==4.10.1 apache-airflow-providers-postgres==6.2.1 apache-airflow-providers-sendgrid==4.1.2 apache-airflow-providers-standard==1.4.1 ### Deployment Official Apache Airflow Helm Chart ### Deployment details Airflow: 3.0.3 ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
