[
https://issues.apache.org/jira/browse/NIFI-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942573#comment-16942573
]
ASF subversion and git services commented on NIFI-6275:
-------------------------------------------------------
Commit 8d748223ff8f80c7a85fc38013ecf0b221adc2da in nifi's branch
refs/heads/master from Jeff Storck
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=8d74822 ]
NIFI-6275 ListHDFS now ignores scheme and authority when uses "Full Path"
filter mode
Updated description for "Full Path" filter mode to state that it will ignore
scheme and authority
Added tests to TestListHDFS for listing an empty and nonexistent dirs
Updated TestListHDFS' mock file system to track state properly when FileStatus
instances are added, and updated listStatus to work properly with the
underlying Map that contains FileStatus instances
Updated ListHDFS' additional details to document "Full Path" filter mode
ignoring scheme and authority, with an example
Updated TestRunners, StandardProcessorTestRunner,
MockProcessorInitializationContext to support passing in a logger.
NIFI-6275 Updated the "Full Path" filter mode to check the full path of a file
with and without its scheme and authority against the filter regex
Added additional documentation for how ListHDFS handles scheme and authority
when "Full Path" filter mode is used
Added test case for "Full Path" filter mode with a regular expression that
includes scheme and authority
This closes #3483.
Signed-off-by: Koji Kawamura <[email protected]>
> ListHDFS with Full Path filter mode regex does not work as intended
> -------------------------------------------------------------------
>
> Key: NIFI-6275
> URL: https://issues.apache.org/jira/browse/NIFI-6275
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Documentation & Website, Extensions
> Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.9.2
> Reporter: Jeff Storck
> Assignee: Jeff Storck
> Priority: Minor
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When using the *{{Full Path}}* filter mode, the regex is applied to the URI
> returned for each file which includes the scheme and authority (hostname, HA
> namespace, port). For the filter to work across multiple HDFS installations
> (such as a flow used on multiple environments that is retrieved from NiFi
> Registry), the regex filter would have to account for the scheme and
> authority by matching possible scheme and authority values.
> To make it easier for the user, the *{{Full Path}}* filter mode's filter
> regex should only be applied to the path components of the URI, without the
> scheme and authority. This can be done by updating the filter for *{{Full
> Path}}* mode to use:
> [Path.getPathWithoutSchemeAndAuthority(Path)|https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/Path.html#getPathWithoutSchemeAndAuthority-org.apache.hadoop.fs.Path-].
> This will bring the regex values in line with the other modes, since those
> are only applied to the value of *{{Path.getName()}}*.
> Migration guidance will be needed when this improvement is released.
> Existing regex values for *{{Full Path}}* filter mode that accepted any
> scheme and authority will still work.
> Those that specify a scheme and authority will *_not_* work, and will have
> to be updated to specify only path components.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)