[ 
https://issues.apache.org/jira/browse/AIRFLOW-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992134#comment-16992134
 ] 

Andrey Klochkov edited comment on AIRFLOW-6171 at 12/10/19 4:00 AM:
--------------------------------------------------------------------

This is happening due to the following defect in 
{{airflow.utils.file.list_py_file_paths}}. In the loop that walks through 
subdirectories the same object {{patterns}} is written to dictionary 
{{patterns_by_dir}} under different keys. When the loop goes through the top 
level dags directory, it puts the same object {{patterns}} under keys 
corresponding to each of the subdirectories. Then when the loop goes through 
subdirectories it fetches the same list from the map and so airflowignore 
present in one of the directories effectively is applied to all other 
subdirectories processed later. 

The fix is to add ".copy()" as shown here:
{code:java}
    # We want patterns defined in a parent folder's .airflowignore to
    # apply to subdirs too
    for d in dirs:
        patterns_by_dir[os.path.join(root, d)] = patterns.copy() {code}


was (Author: aklochkov):
This is happening due to the following defect in 
{{dag_processing.list_py_file_paths}}. In the loop that walks through 
subdirectories the same object {{patterns}} is written to dictionary 
{{patterns_by_dir}} under different keys. When the loop goes through the top 
level dags directory, it puts the same object {{patterns}} under keys 
corresponding to each of the subdirectories. Then when the loop goes through 
subdirectories it fetches the same list from the map and so airflowignore 
present in one of the directories effectively is applied to all other 
subdirectories processed later. 

The fix is to add ".copy()" as shown here:
{code:java}
    # We want patterns defined in a parent folder's .airflowignore to
    # apply to subdirs too
    for d in dirs:
        patterns_by_dir[os.path.join(root, d)] = patterns.copy() {code}

> airflow ignore file with .* located in a subdirectory ignores dags in other 
> dirs
> --------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-6171
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6171
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: core, DAG
>    Affects Versions: 1.10.5, 1.10.6
>         Environment: Ubuntu 18.04
>            Reporter: Andrey Kateshov
>            Priority: Major
>
> I have an airflow dags directory looking like this: x/... y/... z/.... I.e. 
> all dags are placed in subdirectories.
> If I place an .airflowignore with a single line of .* in directory z/ the 
> dags in other directories (e.g x/ and y/) are also ignored. Which is already 
> a big issue. What makes it even stranger that only some of them are ignored, 
> potentially masking the effects of this behaviour. 
> What makes it even worse you won't see that these dags are now disabled in 
> airflow UI unless you completely restart it(possibly together with the 
> scheduler, we restarted both, didn't try to see if only the UI is enough).
> This issue was not present in 1.10.3, but appears in 1.10.5. I didn't test 
> 1.10.4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to