[ 
https://issues.apache.org/jira/browse/AIRFLOW-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723492#comment-16723492
 ] 

ASF GitHub Bot commented on AIRFLOW-1729:
-----------------------------------------

stale[bot] closed pull request #2754: [AIRFLOW-1729] Ignore whole directories 
from .airflowignore
URL: https://github.com/apache/incubator-airflow/pull/2754
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/dag_processing.py b/airflow/utils/dag_processing.py
index 68cee7601e..67502b0d6e 100644
--- a/airflow/utils/dag_processing.py
+++ b/airflow/utils/dag_processing.py
@@ -174,11 +174,11 @@ def list_py_file_paths(directory, safe_mode=True):
     elif os.path.isdir(directory):
         patterns = []
         for root, dirs, files in os.walk(directory, followlinks=True):
-            ignore_file = [f for f in files if f == '.airflowignore']
-            if ignore_file:
-                f = open(os.path.join(root, ignore_file[0]), 'r')
-                patterns += [p for p in f.read().split('\n') if p]
-                f.close()
+            if '.airflowignore' in files:
+                with open(os.path.join(root, '.airflowignore'), 'r') as f:
+                    patterns += [p for p in f if p]
+            dirs[:] = [d for d in dirs if not any(
+                [re.findall(p, os.path.join(root, d)) for p in patterns])]
             for f in files:
                 try:
                     file_path = os.path.join(root, f)


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore whole directories in .airflowignore
> ------------------------------------------
>
>                 Key: AIRFLOW-1729
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1729
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.0.0
>            Reporter: Cedric Hourcade
>            Assignee: Ash Berlin-Taylor
>            Priority: Minor
>             Fix For: 1.10.0
>
>
> The .airflowignore file allows to prevent scanning files for DAG. But even if 
> we blacklist fulldirectory the {{os.walk}} will still go through them no 
> matter how deep they are and skip files one by one, which can be an issue 
> when you keep around big .git or virtualvenv directories.
> I suggest to add something like:
> {code}
> dirs[:] = [d for d in dirs if not any([re.findall(p, os.path.join(root, d)) 
> for p in patterns])]
> {code}
> to prune the directories here: 
> https://github.com/apache/incubator-airflow/blob/cfc2f73c445074e1e09d6ef6a056cd2b33a945da/airflow/utils/dag_processing.py#L208-L209
>  and in {{list_py_file_paths}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to