[ 
https://issues.apache.org/jira/browse/AIRFLOW-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edgar Rodriguez updated AIRFLOW-1495:
-------------------------------------
    Description: 
Using unindexed columns for ordering query results triggers sorting in the 
database, which makes for slower queries.
Solution is to use an indexed column or add an index on the desired columns if 
it doesn't exist, see: 
https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L2489

When checking for indices on task_instance, none is found for {{job_id}}:
{code}
mysql> show index from task_instance;
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table         | Non_unique | Key_name     | Seq_in_index | Column_name     | 
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | 
Index_comment |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| task_instance |          0 | PRIMARY      |            1 | task_id         | 
A         |          71 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          0 | PRIMARY      |            2 | dag_id          | 
A         |          80 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          0 | PRIMARY      |            3 | execution_date  | 
A         |       30056 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_dag_state |            1 | dag_id          | 
A         |          18 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_dag_state |            2 | state           | 
A         |          28 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_pool      |            1 | pool            | 
A         |           1 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_pool      |            2 | state           | 
A         |           3 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_pool      |            3 | priority_weight | 
A         |          16 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_state_lkp |            1 | dag_id          | 
A         |          18 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_state_lkp |            2 | task_id         | 
A         |          82 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_state_lkp |            3 | execution_date  | 
A         |       28037 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_state_lkp |            4 | state           | 
A         |       29354 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_state     |            1 | state           | 
A         |           3 |     NULL | NULL   | YES  | BTREE      |         |     
          |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
{code}

So, when executing a query similar to the one for the view paging, the query 
explain shows the required filesort:
{code}
mysql> explain select * from task_instance order by job_id limit 500 offset 500;
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
| id | select_type | table         | partitions | type | possible_keys | key  | 
key_len | ref  | rows  | filtered | Extra          |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
|  1 | SIMPLE      | task_instance | NULL       | ALL  | NULL          | NULL | 
NULL    | NULL | 30119 |   100.00 | Using filesort |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
{code}

  was:
Using unindexed columns for ordering query results triggers sorting in the 
database, which makes for slower queries.
Solution is to use an indexed column or add an index on the desired columns if 
it doesn't exist, see: 
https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L2489

When checking for indices on task_instance, none is found for job_id:
{code}
mysql> show index from task_instance;
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table         | Non_unique | Key_name     | Seq_in_index | Column_name     | 
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | 
Index_comment |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| task_instance |          0 | PRIMARY      |            1 | task_id         | 
A         |          71 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          0 | PRIMARY      |            2 | dag_id          | 
A         |          80 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          0 | PRIMARY      |            3 | execution_date  | 
A         |       30056 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_dag_state |            1 | dag_id          | 
A         |          18 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_dag_state |            2 | state           | 
A         |          28 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_pool      |            1 | pool            | 
A         |           1 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_pool      |            2 | state           | 
A         |           3 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_pool      |            3 | priority_weight | 
A         |          16 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_state_lkp |            1 | dag_id          | 
A         |          18 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_state_lkp |            2 | task_id         | 
A         |          82 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_state_lkp |            3 | execution_date  | 
A         |       28037 |     NULL | NULL   |      | BTREE      |         |     
          |
| task_instance |          1 | ti_state_lkp |            4 | state           | 
A         |       29354 |     NULL | NULL   | YES  | BTREE      |         |     
          |
| task_instance |          1 | ti_state     |            1 | state           | 
A         |           3 |     NULL | NULL   | YES  | BTREE      |         |     
          |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
{code}

So, when executing a query similar to the one for the view paging, the query 
explain shows the required filesort:
{code}
mysql> explain select * from task_instance order by job_id limit 500 offset 500;
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
| id | select_type | table         | partitions | type | possible_keys | key  | 
key_len | ref  | rows  | filtered | Extra          |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
|  1 | SIMPLE      | task_instance | NULL       | ALL  | NULL          | NULL | 
NULL    | NULL | 30119 |   100.00 | Using filesort |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
{code}


> TaskInstanceModelView using unindexed column for order_by on default
> --------------------------------------------------------------------
>
>                 Key: AIRFLOW-1495
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1495
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: webserver
>            Reporter: Edgar Rodriguez
>
> Using unindexed columns for ordering query results triggers sorting in the 
> database, which makes for slower queries.
> Solution is to use an indexed column or add an index on the desired columns 
> if it doesn't exist, see: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/www/views.py#L2489
> When checking for indices on task_instance, none is found for {{job_id}}:
> {code}
> mysql> show index from task_instance;
> +---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
> | Table         | Non_unique | Key_name     | Seq_in_index | Column_name     
> | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | 
> Index_comment |
> +---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
> | task_instance |          0 | PRIMARY      |            1 | task_id         
> | A         |          71 |     NULL | NULL   |      | BTREE      |         | 
>               |
> | task_instance |          0 | PRIMARY      |            2 | dag_id          
> | A         |          80 |     NULL | NULL   |      | BTREE      |         | 
>               |
> | task_instance |          0 | PRIMARY      |            3 | execution_date  
> | A         |       30056 |     NULL | NULL   |      | BTREE      |         | 
>               |
> | task_instance |          1 | ti_dag_state |            1 | dag_id          
> | A         |          18 |     NULL | NULL   |      | BTREE      |         | 
>               |
> | task_instance |          1 | ti_dag_state |            2 | state           
> | A         |          28 |     NULL | NULL   | YES  | BTREE      |         | 
>               |
> | task_instance |          1 | ti_pool      |            1 | pool            
> | A         |           1 |     NULL | NULL   | YES  | BTREE      |         | 
>               |
> | task_instance |          1 | ti_pool      |            2 | state           
> | A         |           3 |     NULL | NULL   | YES  | BTREE      |         | 
>               |
> | task_instance |          1 | ti_pool      |            3 | priority_weight 
> | A         |          16 |     NULL | NULL   | YES  | BTREE      |         | 
>               |
> | task_instance |          1 | ti_state_lkp |            1 | dag_id          
> | A         |          18 |     NULL | NULL   |      | BTREE      |         | 
>               |
> | task_instance |          1 | ti_state_lkp |            2 | task_id         
> | A         |          82 |     NULL | NULL   |      | BTREE      |         | 
>               |
> | task_instance |          1 | ti_state_lkp |            3 | execution_date  
> | A         |       28037 |     NULL | NULL   |      | BTREE      |         | 
>               |
> | task_instance |          1 | ti_state_lkp |            4 | state           
> | A         |       29354 |     NULL | NULL   | YES  | BTREE      |         | 
>               |
> | task_instance |          1 | ti_state     |            1 | state           
> | A         |           3 |     NULL | NULL   | YES  | BTREE      |         | 
>               |
> +---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
> {code}
> So, when executing a query similar to the one for the view paging, the query 
> explain shows the required filesort:
> {code}
> mysql> explain select * from task_instance order by job_id limit 500 offset 
> 500;
> +----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
> | id | select_type | table         | partitions | type | possible_keys | key  
> | key_len | ref  | rows  | filtered | Extra          |
> +----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
> |  1 | SIMPLE      | task_instance | NULL       | ALL  | NULL          | NULL 
> | NULL    | NULL | 30119 |   100.00 | Using filesort |
> +----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+----------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to