[ 
https://issues.apache.org/jira/browse/HADOOP-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364597#comment-15364597
 ] 

Jason Lowe commented on HADOOP-13343:
-------------------------------------

For example, the following code in MapReduce input split calculation expects 
null to mean the file doesn't exist but an empty array to mean it exists but 
doesn't match any filters:
{code}
  private List<FileStatus> singleThreadedListStatus(JobContext job, Path[] dirs,
      PathFilter inputFilter, boolean recursive) throws IOException {
    List<FileStatus> result = new ArrayList<FileStatus>();
    List<IOException> errors = new ArrayList<IOException>();
    for (int i=0; i < dirs.length; ++i) {
      Path p = dirs[i];
      FileSystem fs = p.getFileSystem(job.getConfiguration()); 
      FileStatus[] matches = fs.globStatus(p, inputFilter);
      if (matches == null) {
        errors.add(new IOException("Input path does not exist: " + p));
      } else if (matches.length == 0) {
        errors.add(new IOException("Input Pattern " + p + " matches 0 files"));
      } else {
{code}

There was a case where a user passed the path to a _SUCCESS file as one of the 
input paths to a job, and the hidden file filter in FileInputFormat suppressed 
the _SUCCESS file.  globStatus returning null instead of an empty array 
triggered the code to report a misleading error to the user.


> globStatus returns null for file path that exists but is filtered
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13343
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>            Priority: Minor
>
> If a file path without globs is passed to globStatus and the file exists but 
> the specified input filter suppresses it then globStatus will return null 
> instead of an empty array.  This makes it impossible for the caller to 
> discern the difference between the file not existing at all vs. being 
> suppressed by the filter and is inconsistent with the way it handles globs 
> for an existing dir but fail to match anything within the dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to