[
https://issues.apache.org/jira/browse/NIFI-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566928#comment-16566928
]
ASF GitHub Bot commented on NIFI-4434:
--------------------------------------
Github user jtstorck commented on the issue:
https://github.com/apache/nifi/pull/2930
@bbende @joewitt
This PR changes the default behavior of how the filter is applied during a
listing, which might require manual migration efforts for some users.
We could add a property to be able to toggle the application of the filter
to directory and file names or filenames only, with the default being directory
and file names. This solution would preserve the current behavior, and allow
users to "opt-in" to having recursive listings retrieve all files regardless of
directory names. There would not be an issue with migration for current users
that depend on the current behavior.
We could also go down the route of allowing the filter to be applied to the
entire path. That gives the user maximum flexibility on how the filter would
work, but requires more regex knowledge and is potentially harder for users to
write the filter they want. This would also require manual migration, but it
might be the best long-term solution. The tooltip on the filter property could
be updated to have an example regex that would provide the default
functionality that users could use as a starting point for custom filters.
Any thoughts on either of these solutions?
> ListHDFS applies File Filter also to subdirectory names in recursive search
> ---------------------------------------------------------------------------
>
> Key: NIFI-4434
> URL: https://issues.apache.org/jira/browse/NIFI-4434
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 1.3.0
> Reporter: Holger Frydrych
> Assignee: Jeff Storck
> Priority: Major
>
> The File Filter regex configured in the ListHDFS processor is applied not
> just to files found, but also to subdirectories.
> If you try to set up a recursive search to list e.g. all csv files in a
> directory hierarchy via a regex like ".*\.csv", it will only pick up csv
> files in the base directory, not in any subdirectory. This is because
> subdirectories don't typically match that regex pattern.
> To fix this, either subdirectories should not be matched against the file
> filter, or the file filter should be applied to the full path of all files
> (relative to the base directory). The GetHDFS processor offers both options
> via a switch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)