[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kengo Seki updated AIRFLOW-246:
-------------------------------
    External issue URL: https://github.com/apache/incubator-airflow/pull/1610

> dag_stats endpoint has a terrible query
> ---------------------------------------
>
>                 Key: AIRFLOW-246
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-246
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: webserver
>    Affects Versions: Airflow 1.7.1
>         Environment: MySQL Backend through sqlalchemy
>            Reporter: Neil Hanlon
>            Assignee: Kengo Seki
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to