[ https://issues.apache.org/jira/browse/AIRFLOW-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418126#comment-16418126 ]
ASF subversion and git services commented on AIRFLOW-1729: ---------------------------------------------------------- Commit 721bc09271856b0a52e22fbcb7bb8232eae800d3 in incubator-airflow's branch refs/heads/master from [~abhishek0812] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=721bc09 ] [AIRFLOW-1729] improve dagBag time Closes #3171 from q2w/master > Ignore whole directories in .airflowignore > ------------------------------------------ > > Key: AIRFLOW-1729 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1729 > Project: Apache Airflow > Issue Type: Improvement > Components: core > Affects Versions: Airflow 2.0 > Reporter: Cedric Hourcade > Assignee: Kamil Sambor > Priority: Minor > > The .airflowignore file allows to prevent scanning files for DAG. But even if > we blacklist fulldirectory the {{os.walk}} will still go through them no > matter how deep they are and skip files one by one, which can be an issue > when you keep around big .git or virtualvenv directories. > I suggest to add something like: > {code} > dirs[:] = [d for d in dirs if not any([re.findall(p, os.path.join(root, d)) > for p in patterns])] > {code} > to prune the directories here: > https://github.com/apache/incubator-airflow/blob/cfc2f73c445074e1e09d6ef6a056cd2b33a945da/airflow/utils/dag_processing.py#L208-L209 > and in {{list_py_file_paths}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)