[ https://issues.apache.org/jira/browse/AIRFLOW-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805005#comment-16805005 ]
ASF subversion and git services commented on AIRFLOW-4173: ---------------------------------------------------------- Commit c576f393bff433c345f99bfcab811bbc2ac0d37e in airflow's branch refs/heads/master from Xiaodong [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=c576f39 ] [AIRFLOW-4173] Improve SchedulerJob.process_file() (#4993) By avoid processing paused DAGs. The actions we avoid here is mainly the dagbag.get_dag() on paused DAGs. DagBag.get_dag() itself is relatively expensive, so this change brings considerable performance improvement. > Improve scheduler performance by avoid Unnecessary actions in > SchedulerJob.process_file() > ----------------------------------------------------------------------------------------- > > Key: AIRFLOW-4173 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4173 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler > Affects Versions: 1.10.2 > Reporter: Xiaodong DENG > Assignee: Xiaodong DENG > Priority: Critical > > In current implementation of *SchedulerJob.process_file()* > ([https://github.com/apache/airflow/blob/068ded96cd279dcd51f5b6d1e96f09205ecf40c8/airflow/jobs.py#L1722-L1734),] > action '*dag = dagbag.get_dag(dag_id)*' is to be done no matter if dag_id is > pointing to a paused DAG. However, the result will not be used later if that > DAG is paused. > This is causing inefficiency. > We can do the `if DAG is paused` check first, before we invoke '*dag = > dagbag.get_dag(dag_id)*'. This may bring considerable improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)