Taragolis commented on issue #33688: URL: https://github.com/apache/airflow/issues/33688#issuecomment-1698823134
> here is the performance activity for the last 5 hours. The peak at 3 AM we see represents also a peak in activity in terms of DAGs running. Anything particular you want to check ? According to Average Active Sessions (AAS) more time consuming operations is locks (`transactionid` and `tuple` in the legend), I'm not sure that there is something serious on DB backend, even if all this metrics are average for the periods, and actual spike could be missed on it. You could click on legend on `transactionid` and this would keep only top 10 queries which impact this event on selected period, I guess it would be `SELECT ... FROM dag_run WHERE dag_run.dag_id = '{DAG_ID}' AND dag_run.run_id = '{RUN_ID}' FOR UPDATE;` --- In additional I locally run simple dag on Airflow 2.6.3 and Airflow 2.7.0 > [!IMPORTANT] > I run my "performance comparison" on local machine so latency to DB with almost zero, executor was Local and DAG/Task pretty simple and nothing other running at that moment. So result might be far-far-far away of actual problem ```python from airflow.decorators import task from airflow import DAG import pendulum with DAG( dag_id=f"performance-check", start_date=pendulum.datetime(2021, 1, 1, tz='UTC'), end_date=None, schedule="@daily", catchup=True, tags=["performance"], max_active_runs=64, ) as dag: @task def sample_task(test_data=None, ti=None): print(f"{ti.dag_id}-{ti.run_id}-{ti.task_id}[{ti.map_index}]: {test_data}") sample_task() ``` Also I've turn on most of the logging on postgres, clean postgres log file before turn on DAG and after all 970 dag runs completed use PGbadger on postgres log: [pgbadger-report-airflow-2-results.zip](https://github.com/apache/airflow/files/12473834/pgbadger-report-airflow-2-.zip) The main differences was in obtain information about previous dag run **Airflow 2.7.0** First with total cumulative execution time 24s820ms for 9700 queries ![image](https://github.com/apache/airflow/assets/3998685/8f6836e0-3a21-42f0-9535-44b0d5d81be0) **Airflow 2.6.3**, Second (lets ignore COMMIT) with total cumulative execution time 6s525ms for 9700 queries ![image](https://github.com/apache/airflow/assets/3998685/1552f334-974a-44db-81ba-2b66a71e967d) This [behaviour fixed](https://github.com/apache/airflow/pull/33672) and should be part of 2.7.1, in some circumstances it could be a reason of performance degradation, if DB backed far away of Airflow (latency high), quite a few previous DAG runs exists. In additional it also could be a reason why RAM/CPU usage increased. However in deployment which far away of prod usage this was impact just for about additional 100ms-1s -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org