[ 
https://issues.apache.org/jira/browse/AIRFLOW-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805836#comment-16805836
 ] 

ASF subversion and git services commented on AIRFLOW-4173:
----------------------------------------------------------

Commit d10ffe7d820e42896043ca0dbebf6e3e93776985 in airflow's branch 
refs/heads/v1-10-stable from Xiaodong
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=d10ffe7 ]

[AIRFLOW-4173] Improve SchedulerJob.process_file() (#4993)

By avoid processing paused DAGs.

The actions we avoid here is mainly the dagbag.get_dag() on paused DAGs.
DagBag.get_dag() itself is relatively expensive, so this change brings
considerable performance improvement.

> Improve scheduler performance by avoid Unnecessary actions in 
> SchedulerJob.process_file()
> -----------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4173
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4173
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 1.10.2
>            Reporter: Xiaodong DENG
>            Assignee: Xiaodong DENG
>            Priority: Critical
>             Fix For: 1.10.3
>
>
> In current implementation of *SchedulerJob.process_file()* 
> ([https://github.com/apache/airflow/blob/068ded96cd279dcd51f5b6d1e96f09205ecf40c8/airflow/jobs.py#L1722-L1734),]
>  action '*dag = dagbag.get_dag(dag_id)*' is to be done no matter if dag_id is 
> pointing to a paused DAG. However, the result will not be used later if that 
> DAG is paused.
> This is causing inefficiency.
> We can do the `if DAG is paused` check first, before we invoke '*dag = 
> dagbag.get_dag(dag_id)*'. This may bring considerable improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to