[
https://issues.apache.org/jira/browse/AIRFLOW-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kamil Bregula updated AIRFLOW-6965:
-----------------------------------
Summary: The get_task_instances method is performed three times during one
creation of the DAGRun file. (was: The method is performed playthree times
during one creation of the DAGRun file.)
> The get_task_instances method is performed three times during one creation of
> the DAGRun file.
> ----------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-6965
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6965
> Project: Apache Airflow
> Issue Type: Improvement
> Components: scheduler
> Affects Versions: 1.10.9
> Reporter: Kamil Bregula
> Priority: Major
>
> Hello,
> Task_instances queries are executed three times. This is redundant. If we can
> limit the number of these queries, we can achieve performance improvements.
> First query:
> perform_file:
> [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags:
> [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> create_dag_run:
> [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L726]
> create_dagrun:
> [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L638]
> verify_integrity:
> [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dag.py#L1454]
> get_task_instances:
> [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436]
> Third query:
> perform_file:
> [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags:
> [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> _process_task_instances:
> [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738]
> update_state:
> [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L685]
> get_task_instances:
> [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292
> ]
> perform_file:
> [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags:
> [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> _process_task_instances:
> [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738]
> verify_integrity:
> [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L684]
> get_task_instances:
> [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436]
> [|https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292]
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
