[
https://issues.apache.org/jira/browse/AIRFLOW-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805836#comment-16805836
]
ASF subversion and git services commented on AIRFLOW-4173:
----------------------------------------------------------
Commit d10ffe7d820e42896043ca0dbebf6e3e93776985 in airflow's branch
refs/heads/v1-10-stable from Xiaodong
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=d10ffe7 ]
[AIRFLOW-4173] Improve SchedulerJob.process_file() (#4993)
By avoid processing paused DAGs.
The actions we avoid here is mainly the dagbag.get_dag() on paused DAGs.
DagBag.get_dag() itself is relatively expensive, so this change brings
considerable performance improvement.
> Improve scheduler performance by avoid Unnecessary actions in
> SchedulerJob.process_file()
> -----------------------------------------------------------------------------------------
>
> Key: AIRFLOW-4173
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4173
> Project: Apache Airflow
> Issue Type: Improvement
> Components: scheduler
> Affects Versions: 1.10.2
> Reporter: Xiaodong DENG
> Assignee: Xiaodong DENG
> Priority: Critical
> Fix For: 1.10.3
>
>
> In current implementation of *SchedulerJob.process_file()*
> ([https://github.com/apache/airflow/blob/068ded96cd279dcd51f5b6d1e96f09205ecf40c8/airflow/jobs.py#L1722-L1734),]
> action '*dag = dagbag.get_dag(dag_id)*' is to be done no matter if dag_id is
> pointing to a paused DAG. However, the result will not be used later if that
> DAG is paused.
> This is causing inefficiency.
> We can do the `if DAG is paused` check first, before we invoke '*dag =
> dagbag.get_dag(dag_id)*'. This may bring considerable improvement.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)