Hi,

I'm using Airflow 1.7  for some time now (~half a year in prod) and lately
I stated to notice that the load time of the "recent statuses" part on the
UI main page is starting to take a lot of time. Got to 10-13 seconds.

>From looking in Mysql I found that this is the query that takes the whole
time:

SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS
task_instance_state, count(task_instance.task_id) AS count_1
FROM task_instance LEFT OUTER JOIN dag_run ON dag_run.dag_id =
task_instance.dag_id AND dag_run.execution_date =
task_instance.execution_date AND dag_run.state = 'running' LEFT OUTER JOIN
(SELECT dag_run.dag_id AS dag_id, max(dag_run.execution_date) AS
execution_date
FROM dag_run GROUP BY dag_run.dag_id) AS last_dag_run ON
last_dag_run.dag_id = task_instance.dag_id AND last_dag_run.execution_date
= task_instance.execution_date
WHERE task_instance.task_id IN ( <big list of all tasks> ) AND
(dag_run.dag_id IS NOT NULL OR last_dag_run.dag_id IS NOT NULL) GROUP BY
task_instance.dag_id, task_instance.state

First, I think the second left outer join might be a bit redundant and can
be replaced with a simple join. When I changed that, I got the same results
in less than 2 secs. (down from 13 seconds)

1. Does that change make sense?
2. Is that something that was resolved in recent version? (Couldn't find
any issue)
3. Is that something I can change by simply changing some sql resource
files or is it generated by some kind of an ORM?

Thanks!

Reply via email to