XD-DENG opened a new pull request #4993: [AIRFLOW-4173] Improve scheduler performance by avoid unnecessary actions in SchedulerJob.process_file() URL: https://github.com/apache/airflow/pull/4993 ### Jira - https://issues.apache.org/jira/browse/AIRFLOW-4173 ### Description In current implementation of `SchedulerJob.process_file()` https://github.com/apache/airflow/blob/068ded96cd279dcd51f5b6d1e96f09205ecf40c8/airflow/jobs.py#L1722-L1734, action `dag = dagbag.get_dag(dag_id)` is to be done no matter if dag_id is pointing to a paused DAG. However, the result will not be used later if that DAG is paused. This is causing inefficiency. We can do the `if dag_id not in paused_dag_ids:` check first, before we invoke `dag = dagbag.get_dag(dag_id)`. This change may bring considerable improvement (running `dag = dagbag.get_dag(dag_id)` for 1000 dag_ids is taking ~8 seconds in my environment).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
