nick-msk-ai commented on issue #33688: URL: https://github.com/apache/airflow/issues/33688#issuecomment-1699510571
Like other users above, when upgrading I also noticed major increases to time taken to execute my dag_run that was previously taking < 1 minute to execute (scheduled every minute). It led to a cascade of unfinished dag_runs across this dag, and subsequently all my other dags were affected. Like @raphaelsimeon, the performance insights from my RDS instance showed that the query to get the DAG history was taking a long time to return. I also noticed that each running dag_run triggers a query to get this data (all entries before the current dag_run), and also re-requests this query at multiple points while the DAG is running. This seemed to be the root of the issue as even when the first query does not return in a timely manner, a subsequent one is automatically triggered by the scheduler and so on, until there is a log-jam of pending queries (until the max_active_runs is reached) which affects database performance for every other process and DAG. For context I run `airflow db clean` as part of a daily DAG to ensure that only the last 21 days of metadata is kept. So for my DAG on a 1 minute interval, I had ~28000 rows in the dag_run table. For another of my DAGs that runs every 10 mins, there are consistently ~ 2900 rows. **Experiment 1:** If I turned off the DAG on the 1 minute interval (also setting all dag runs of that dag to a non-running state) then the rest of my DAGs would execute well within my expected intervals. **Experiment 2:** If I turned on the DAG with 1 minute interval having deleted all relevant entries from the dag_run table, the DAG executes within the interval. So I have found a work-around for my particular case by deleting rows from the dag_run metadata table. I would be interested to know what kind of state the dag_run tables are for the users who have posted above, and whether they are performing any scheduled maintenance on those tables? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
