[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900772#comment-13900772
 ] 

Jason Dere commented on MAPREDUCE-5756:
---------------------------------------

Ok, looking a little more at this .. so FileInputFormat.listStatus() is 
returning the same results on hadoop-1 and hadoop-2, and it includes the 
directories, so I guess listStatus() is not the issue. It looks like what 
CombineFileInputFormat.getSplits() does with the file list after getting it is 
different between hadoop-1 and hadoop-2, where hadoop-2 includes those 
directories in the list of InputSplits:

(Hadoop 20S means hadoop 1.x)
{noformat}
2014-02-13 13:35:32,492 ERROR shims.HadoopShimsSecure 
(HadoopShimsSecure.java:getSplits(345)) - ** Hadoop version: 0.20S
2014-02-13 13:35:32,492 ERROR shims.HadoopShimsSecure 
(HadoopShimsSecure.java:getSplits(349)) - ** called super.getSplits(): 
[Paths:/000000_0:0+50 Locations:127.0.0.1:; ]
{noformat}

(Hadoop 23 means hadoop 2.x)
{noformat}
2014-02-13 13:38:12,425 ERROR shims.HadoopShimsSecure 
(HadoopShimsSecure.java:getSplits(345)) - ** Hadoop version: 0.23
2014-02-13 13:38:12,425 ERROR shims.HadoopShimsSecure 
(HadoopShimsSecure.java:getSplits(349)) - ** called super.getSplits(): 
[Paths:/000000_0:0+50 Locations:127.0.0.1:; , 
Paths:/Users:0+0,/build:0+0,/tmp:0+0,/user:0+0 Locations:; ]
{noformat}


> FileInputFormat.listStatus() including directories in its results
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-5756
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jason Dere
>
> Trying to track down HIVE-6401, where we see some "is not a file" errors 
> because getSplits() is giving us directories.  I believe the culprit is 
> FileInputFormat.listStatus():
> {code}
>                 if (recursive && stat.isDirectory()) {
>                   addInputPathRecursively(result, fs, stat.getPath(),
>                       inputFilter);
>                 } else {
>                   result.add(stat);
>                 }
> {code}
> Which seems to be allowing directories to be added to the results if 
> recursive is false.  Is this meant to return directories? If not, I think it 
> should look like this:
> {code}
>                 if (stat.isDirectory()) {
>                  if (recursive) {
>                   addInputPathRecursively(result, fs, stat.getPath(),
>                       inputFilter);
>                  }
>                 } else {
>                   result.add(stat);
>                 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to