jcborras opened a new issue, #25502:
URL: https://github.com/apache/airflow/issues/25502

   ### Apache Airflow version
   
   2.1.2
   
   ### What happened
   
   When loading AirFlow v2 home page subsequent requests include `GET 
/last_dagruns`, `GET /dag_stats` and `GET /tasks_stats` which in turn query the 
database (PostgreSQL in this case). These DB queries (most likely generated by 
an ORM) have the form:
   
   ```{sql}
   SELECT ...
   FROM ..
   WHERE dag_id IN ('active_dag1', 'active_dag2, ...., 'active_dag_n`)
   ```
   
   Which have one issue: unless the postgresql query optimizer figures out that 
those strings are actually PKs in some table the `IN` operation is carried out 
by doing a linear search which when the list of DAGs and the number of rows in 
the tables is high (ie. 300 actives dags and 10M rows) results in response 
times in the order of minutes not seconds.
   
   This situation has unintended consequences:
   - if the number of HTTP workers is low (ie. 3) it will prevent other users 
from loading any page.
   - if your monitoring is based on the classic frequent `GET /health` request 
you are like to get timeout hence triggering alarms.
   
   ### What you think should happen instead
   
   I've made the queries run faster by creating a tmp table storing the values 
in the `IN` operator and then `JOIN`ing. That is:
   
   ```{sql}
   SELECT ..
   FROM ..
   JOIN tmp_dag_id USING (dag_id)
   ```
   Then the implicit indexes take care of it all. But that of course is a lot 
to ask to an ORM.
   
   ### How to reproduce
   
   You'll need a setup with a few hundred DAGs. If you run them often the db 
will pretty soon contain tables with millions of entries thus making the issue 
arise.
   
   ### Operating System
   
   Linux
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   Docker images which operations managed under AWS ECS and Fargate.
   
   ### Anything else
   
   N/A
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to