There was no index composed of dag_id and execution_date. So, when scheduler 
find all tis of dagrun like this "select * from task_instance where dag_id = 
'some_id' and execution_date = '2018-09-01 ...'", this query will be using 
ti_dag_state index (I was testing it in mysql workbench). Perhaps there's no 
problem when range of execution_date is small (under 1000 dagrun), but I had 
experienced slow allocation of tis when the dag had 1000+ accumulative dagrun. 
So, now I was using airflow with adding new index (dag_id, execution_date) on 
task_instance table. I have attached result of my test
![image](https://user-images.githubusercontent.com/6738941/45191171-bc525000-b27c-11e8-9762-bfd18cf99011.png)
![image](https://user-images.githubusercontent.com/6738941/45191184-d2f8a700-b27c-11e8-8739-fda9742985ff.png)


[ Full content available at: 
https://github.com/apache/incubator-airflow/pull/3840 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to