[ https://issues.apache.org/jira/browse/AIRFLOW-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723492#comment-16723492 ]
ASF GitHub Bot commented on AIRFLOW-1729: ----------------------------------------- stale[bot] closed pull request #2754: [AIRFLOW-1729] Ignore whole directories from .airflowignore URL: https://github.com/apache/incubator-airflow/pull/2754 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/utils/dag_processing.py b/airflow/utils/dag_processing.py index 68cee7601e..67502b0d6e 100644 --- a/airflow/utils/dag_processing.py +++ b/airflow/utils/dag_processing.py @@ -174,11 +174,11 @@ def list_py_file_paths(directory, safe_mode=True): elif os.path.isdir(directory): patterns = [] for root, dirs, files in os.walk(directory, followlinks=True): - ignore_file = [f for f in files if f == '.airflowignore'] - if ignore_file: - f = open(os.path.join(root, ignore_file[0]), 'r') - patterns += [p for p in f.read().split('\n') if p] - f.close() + if '.airflowignore' in files: + with open(os.path.join(root, '.airflowignore'), 'r') as f: + patterns += [p for p in f if p] + dirs[:] = [d for d in dirs if not any( + [re.findall(p, os.path.join(root, d)) for p in patterns])] for f in files: try: file_path = os.path.join(root, f) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ignore whole directories in .airflowignore > ------------------------------------------ > > Key: AIRFLOW-1729 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1729 > Project: Apache Airflow > Issue Type: Improvement > Components: core > Affects Versions: 2.0.0 > Reporter: Cedric Hourcade > Assignee: Ash Berlin-Taylor > Priority: Minor > Fix For: 1.10.0 > > > The .airflowignore file allows to prevent scanning files for DAG. But even if > we blacklist fulldirectory the {{os.walk}} will still go through them no > matter how deep they are and skip files one by one, which can be an issue > when you keep around big .git or virtualvenv directories. > I suggest to add something like: > {code} > dirs[:] = [d for d in dirs if not any([re.findall(p, os.path.join(root, d)) > for p in patterns])] > {code} > to prune the directories here: > https://github.com/apache/incubator-airflow/blob/cfc2f73c445074e1e09d6ef6a056cd2b33a945da/airflow/utils/dag_processing.py#L208-L209 > and in {{list_py_file_paths}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)