[ 
https://issues.apache.org/jira/browse/AIRFLOW-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542905#comment-16542905
 ] 

ASF subversion and git services commented on AIRFLOW-1729:
----------------------------------------------------------

Commit 23191605e463c85f0935cbf5c47f31c357d1596b in incubator-airflow's branch 
refs/heads/v1-10-test from [~ashb]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2319160 ]

[AIRFLOW-1729][AIRFLOW-2797][AIRFLOW-2729] Ignore whole directories in 
.airflowignore

We can ignore whole directories by removing them
from the `dirs` array
that `os.walk()` returns. Doing this means that we
fewer disk ops if
someone has a set of modules in their dag folder
that they want to
ignore.

Also fixes [AIRFLOW-2797] - we weren't honoring
.airflowignore from a
parent dir as of #3717 -- that (expected)
behaviour is now back again.

De-duplicate the walking code as well - we had two
versions that had
gotten out of sync as of #3171. So that doesn't
happen again we now only
have one version.

Closes #3602 from ashb/ignore-whole-dirs-
airflowignore

(cherry picked from commit 6b2fdbef0ab4bd1ed91e6338bcf6440e782b7035)
Signed-off-by: Bolke de Bruin <bo...@xs4all.nl>


> Ignore whole directories in .airflowignore
> ------------------------------------------
>
>                 Key: AIRFLOW-1729
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1729
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: Airflow 2.0
>            Reporter: Cedric Hourcade
>            Assignee: Ash Berlin-Taylor
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> The .airflowignore file allows to prevent scanning files for DAG. But even if 
> we blacklist fulldirectory the {{os.walk}} will still go through them no 
> matter how deep they are and skip files one by one, which can be an issue 
> when you keep around big .git or virtualvenv directories.
> I suggest to add something like:
> {code}
> dirs[:] = [d for d in dirs if not any([re.findall(p, os.path.join(root, d)) 
> for p in patterns])]
> {code}
> to prune the directories here: 
> https://github.com/apache/incubator-airflow/blob/cfc2f73c445074e1e09d6ef6a056cd2b33a945da/airflow/utils/dag_processing.py#L208-L209
>  and in {{list_py_file_paths}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to