[ 
https://issues.apache.org/jira/browse/AIRFLOW-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805005#comment-16805005
 ] 

ASF subversion and git services commented on AIRFLOW-4173:
----------------------------------------------------------

Commit c576f393bff433c345f99bfcab811bbc2ac0d37e in airflow's branch 
refs/heads/master from Xiaodong
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=c576f39 ]

[AIRFLOW-4173] Improve SchedulerJob.process_file() (#4993)

By avoid processing paused DAGs.

The actions we avoid here is mainly the dagbag.get_dag() on paused DAGs.
DagBag.get_dag() itself is relatively expensive, so this change brings
considerable performance improvement.

> Improve scheduler performance by avoid Unnecessary actions in 
> SchedulerJob.process_file()
> -----------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4173
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4173
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 1.10.2
>            Reporter: Xiaodong DENG
>            Assignee: Xiaodong DENG
>            Priority: Critical
>
> In current implementation of *SchedulerJob.process_file()* 
> ([https://github.com/apache/airflow/blob/068ded96cd279dcd51f5b6d1e96f09205ecf40c8/airflow/jobs.py#L1722-L1734),]
>  action '*dag = dagbag.get_dag(dag_id)*' is to be done no matter if dag_id is 
> pointing to a paused DAG. However, the result will not be used later if that 
> DAG is paused.
> This is causing inefficiency.
> We can do the `if DAG is paused` check first, before we invoke '*dag = 
> dagbag.get_dag(dag_id)*'. This may bring considerable improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to