Sorry for recreate PR. (I had ruined master of my fork. It will be never
occured again.)
[ Description ]
There was no index composed of dag_id and execution_date. So, when scheduler
find all tis of dagrun like this "select * from task_instance where dag_id =
'some_id' and execution_date = '2018-09-01 ...'", this query will be using
ti_dag_state index (I was testing it in mysql workbench. I was expecting
'ti_state_lkp' but, it was not that case). Perhaps there's no problem when
range of execution_date is small (under 1000 dagrun), but I had experienced
slow allocation of tis when the dag had 1000+ accumulative dagrun. So, now I
was using airflow with adding new index ti_dag_date (dag_id, execution_date) on
task_instance table. I have attached result of my test :)
[ Test ] I have tested using 1.10 version
1. just running scheduler with past start_date and high concurrency. (3 years
ago, 10 minute interval)
2. scheduler may be executing backfill and "select tis" query (like below
sequence)
models.py > DAG.run
jobs.py > BaseJob.run
jobs.py > BackfillJob._execute
jobs.py > BackfillJob._execute_for_run_dates
jobs.py > BackfillJob._task_instances_for_dag_run
models.py > DagRun.get_task_instances
tis = session.query(TI).filter(
TI.dag_id == self.dag_id,
TI.execution_date == self.execution_date,
)
3. wait until enough dagruns will be accumlate.
I can find that many slow query logs get to occur from mysql log file. (query
like below sample)
"select * from task_instance where dag_id = 'some_id' and execution_date =
'2018-09-01 ...'"
[ASIS] current

[TOBE] after adding new index

### Jira
- [ ] My PR addresses the following [Airflow
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
- https://issues.apache.org/jira/browse/AIRFLOW-XXX
- In case you are fixing a typo in the documentation you can prepend your
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
### Description
- [ ] Here are some details about my PR, including screenshots of any UI
changes:
### Tests
- [ ] My PR adds the following unit tests __OR__ does not need testing for this
extremely good reason:
### Commits
- [ ] My commits all reference Jira issues in their subject lines, and I have
squashed multiple commits if they address the same issue. In addition, my
commits follow the guidelines from "[How to write a good git commit
message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Documentation
- [ ] In case of new functionality, my PR adds documentation that describes how
to use it.
- When adding new operators/hooks/sensors, the autoclass documentation
generation needs to be added.
### Code Quality
- [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
[ Full content available at:
https://github.com/apache/incubator-airflow/pull/3885 ]
This message was relayed via gitbox.apache.org for [email protected]