[ 
https://issues.apache.org/jira/browse/AIRFLOW-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG updated AIRFLOW-4173:
-----------------------------------
    Description: 
In current implementation of *SchedulerJob.process_file()* 
([https://github.com/apache/airflow/blob/068ded96cd279dcd51f5b6d1e96f09205ecf40c8/airflow/jobs.py#L1722-L1734),]
 action '*dag = dagbag.get_dag(dag_id)*' is to be done no matter if dag_id is 
pointing to a paused DAG. However, the result will not be used later if that 
DAG is paused.

This is causing inefficiency.

We can do the `if DAG is paused` check first, before we invoke '*dag = 
dagbag.get_dag(dag_id)*'. This may bring considerable improvement.

  was:
In current implementation of *SchedulerJob.process_file()* 
([https://github.com/apache/airflow/blob/068ded96cd279dcd51f5b6d1e96f09205ecf40c8/airflow/jobs.py#L1722-L1734),]
 action '*dag = dagbag.get_dag(dag_id)*' is to be done no matter if dag_id is 
pointing to a paused DAG. However, the result will not be used later if that 
DAG is paused.

This is causing inefficiency.


> Improve scheduler performance by avoid Unnecessary actions in 
> SchedulerJob.process_file()
> -----------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4173
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4173
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 1.10.2
>            Reporter: Xiaodong DENG
>            Assignee: Xiaodong DENG
>            Priority: Critical
>
> In current implementation of *SchedulerJob.process_file()* 
> ([https://github.com/apache/airflow/blob/068ded96cd279dcd51f5b6d1e96f09205ecf40c8/airflow/jobs.py#L1722-L1734),]
>  action '*dag = dagbag.get_dag(dag_id)*' is to be done no matter if dag_id is 
> pointing to a paused DAG. However, the result will not be used later if that 
> DAG is paused.
> This is causing inefficiency.
> We can do the `if DAG is paused` check first, before we invoke '*dag = 
> dagbag.get_dag(dag_id)*'. This may bring considerable improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to