[ 
https://issues.apache.org/jira/browse/HADOOP-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-3497:
------------------------------

    Attachment: hadoop-3497-v3.patch

The test that is failing is 
TestFileInputFormatPathFilter#testWithPathFilterWithoutGlob. This creates files 
named a, b, aa, bb in a directory, then uses an input format with a filter that 
only accepts files whose last component is 1 character long. Only files a and b 
should match. The input path is the directory, not a glob path, and to work it 
relies on the following following behaviour of FileSystem#globStatus.

If you call FileSystem#globStatus(Path pathPattern, PathFilter filter) with a 
pathPattern that has a fixed (non-globbing) final component, then the status 
for that path will always be returned, regardless of the filter.

So, for a path /a which exists

{code}
fs.globStatus(new Path("/a"), new PathFilter() {
  @Override
  public boolean accept(Path path) {
    return false;
  }})
{code}

will return the status for /a, even though the filter rejects every path!

This seems wrong, and should really be changed. It has a potential impact on 
applications however, since a filter is now being applied that previously 
wasn't. Does this seem the right thing to do?

I've attached a patch which fixes the test.

> File globbing with a PathFilter is too restrictive
> --------------------------------------------------
>
>                 Key: HADOOP-3497
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3497
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.17.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-3497-test.patch, hadoop-3497-v2.patch, 
> hadoop-3497-v3.patch, hadoop-3497.patch
>
>
> Consider the file hierarchy
> {noformat}
> /a
> /a/b
> {noformat}
> Calling the globStatus method on FileSystem with a path of 
> {noformat}/*/*{noformat} and a PathFilter that only accepts {{/a/b}} returns 
> no matches. It should return a single match: {{/a/b}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to