[ 
https://issues.apache.org/jira/browse/NIFI-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566928#comment-16566928
 ] 

ASF GitHub Bot commented on NIFI-4434:
--------------------------------------

Github user jtstorck commented on the issue:

    https://github.com/apache/nifi/pull/2930
  
    @bbende @joewitt 
    This PR changes the default behavior of how the filter is applied during a 
listing, which might require manual migration efforts for some users.  
    
    We could add a property to be able to toggle the application of the filter 
to directory and file names or filenames only, with the default being directory 
and file names.  This solution would preserve the current behavior, and allow 
users to "opt-in" to having recursive listings retrieve all files regardless of 
directory names.  There would not be an issue with migration for current users 
that depend on the current behavior.
    
    We could also go down the route of allowing the filter to be applied to the 
entire path.  That gives the user maximum flexibility on how the filter would 
work, but requires more regex knowledge and is potentially harder for users to 
write the filter they want.  This would also require manual migration, but it 
might be the best long-term solution.  The tooltip on the filter property could 
be updated to have an example regex that would provide the default 
functionality that users could use as a starting point for custom filters.
    
    Any thoughts on either of these solutions?


> ListHDFS applies File Filter also to subdirectory names in recursive search
> ---------------------------------------------------------------------------
>
>                 Key: NIFI-4434
>                 URL: https://issues.apache.org/jira/browse/NIFI-4434
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Holger Frydrych
>            Assignee: Jeff Storck
>            Priority: Major
>
> The File Filter regex configured in the ListHDFS processor is applied not 
> just to files found, but also to subdirectories. 
> If you try to set up a recursive search to list e.g. all csv files in a 
> directory hierarchy via a regex like ".*\.csv", it will only pick up csv 
> files in the base directory, not in any subdirectory. This is because 
> subdirectories don't typically match that regex pattern.
> To fix this, either subdirectories should not be matched against the file 
> filter, or the file filter should be applied to the full path of all files 
> (relative to the base directory). The GetHDFS processor offers both options 
> via a switch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to