[
https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899724#comment-13899724
]
Jason Lowe commented on MAPREDUCE-5756:
---------------------------------------
To clarify the origins of the recursion feature, it was added to
mapred.FileInputFormat in MAPREDUCE-1501 and later MAPREDUCE-3193 added feature
parity to mapreduce.lib.input.FileInputFormat for those migrating from
mapred.FileInputFormat.
>From the code I'd expect branch-1 and branch-2 should act similarly here when
>in non-recursive mode. They both list the contents of the top-level input
>path, and if any child is a directory they list that second-level directory
>and return the directory directly if it's not recursive. I don't see anywhere
>in branch-1 code where it's completely ignoring directories, but maybe I'm
>missing something.
> FileInputFormat.listStatus() including directories in its results
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-5756
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Jason Dere
>
> Trying to track down HIVE-6401, where we see some "is not a file" errors
> because getSplits() is giving us directories. I believe the culprit is
> FileInputFormat.listStatus():
> {code}
> if (recursive && stat.isDirectory()) {
> addInputPathRecursively(result, fs, stat.getPath(),
> inputFilter);
> } else {
> result.add(stat);
> }
> {code}
> Which seems to be allowing directories to be added to the results if
> recursive is false. Is this meant to return directories? If not, I think it
> should look like this:
> {code}
> if (stat.isDirectory()) {
> if (recursive) {
> addInputPathRecursively(result, fs, stat.getPath(),
> inputFilter);
> }
> } else {
> result.add(stat);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)