There was no index composed of dag_id and execution_date. So, when scheduler 
find all tis of dagrun like this "select * from task_instance where dag_id = 
'~' and execution_date = '~' ", this query will be using ti_dag_state index (I 
was testing it in mysql workbench). Perhaps there's no problem when range of 
execution_date is small (under 1000 dagrun), but I had experienced slow 
allocation of tis when the dag had 1000+ accumulative dagrun. So, now I was 
using airflow with adding new index (dag_id, execution_date) on task_instance 
table. I have attached result of my test
![image](https://user-images.githubusercontent.com/6738941/45191171-bc525000-b27c-11e8-9762-bfd18cf99011.png)
![image](https://user-images.githubusercontent.com/6738941/45191184-d2f8a700-b27c-11e8-8739-fda9742985ff.png)


[ Full content available at: 
https://github.com/apache/incubator-airflow/pull/3840 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to