[
https://issues.apache.org/jira/browse/HADOOP-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tom White updated HADOOP-3497:
------------------------------
Attachment: hadoop-3497-v3.patch
The test that is failing is
TestFileInputFormatPathFilter#testWithPathFilterWithoutGlob. This creates files
named a, b, aa, bb in a directory, then uses an input format with a filter that
only accepts files whose last component is 1 character long. Only files a and b
should match. The input path is the directory, not a glob path, and to work it
relies on the following following behaviour of FileSystem#globStatus.
If you call FileSystem#globStatus(Path pathPattern, PathFilter filter) with a
pathPattern that has a fixed (non-globbing) final component, then the status
for that path will always be returned, regardless of the filter.
So, for a path /a which exists
{code}
fs.globStatus(new Path("/a"), new PathFilter() {
@Override
public boolean accept(Path path) {
return false;
}})
{code}
will return the status for /a, even though the filter rejects every path!
This seems wrong, and should really be changed. It has a potential impact on
applications however, since a filter is now being applied that previously
wasn't. Does this seem the right thing to do?
I've attached a patch which fixes the test.
> File globbing with a PathFilter is too restrictive
> --------------------------------------------------
>
> Key: HADOOP-3497
> URL: https://issues.apache.org/jira/browse/HADOOP-3497
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.17.0
> Reporter: Tom White
> Assignee: Tom White
> Attachments: hadoop-3497-test.patch, hadoop-3497-v2.patch,
> hadoop-3497-v3.patch, hadoop-3497.patch
>
>
> Consider the file hierarchy
> {noformat}
> /a
> /a/b
> {noformat}
> Calling the globStatus method on FileSystem with a path of
> {noformat}/*/*{noformat} and a PathFilter that only accepts {{/a/b}} returns
> no matches. It should return a single match: {{/a/b}}.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.