[ https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kengo Seki updated AIRFLOW-246: ------------------------------- External issue URL: https://github.com/apache/incubator-airflow/pull/1610 > dag_stats endpoint has a terrible query > --------------------------------------- > > Key: AIRFLOW-246 > URL: https://issues.apache.org/jira/browse/AIRFLOW-246 > Project: Apache Airflow > Issue Type: Bug > Components: webserver > Affects Versions: Airflow 1.7.1 > Environment: MySQL Backend through sqlalchemy > Reporter: Neil Hanlon > Assignee: Kengo Seki > > Hitting this endpoint creates a series of queries on the database which take > over 20 seconds to run, causing the page to not load for that entire time. > Luckily the main page (which includes this under "Recent Statuses") loads > this synchronously, but still... waiting almost half a minute (at times more) > to see the statuses for dags is really not fun. > We have less than a million rows in the task_instance table--so it's not even > a problem with that. > Here's a query profile for the query: > https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf > We've done some optimizations on the database, but to no avail. > The query: > {code:sql} > SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS > task_instance_state, count(task_instance.task_id) AS count_1 FROM > task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, > dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = > 'running') AS running_dag_run ON running_dag_run.dag_id = > task_instance.dag_id AND running_dag_run.execution_date = > task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS > dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY > dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id > AND last_dag_run.execution_date = task_instance.execution_date WHERE > task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR > last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, > task_instance.state; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)