[
https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247653#comment-14247653
]
Jinghui Wang commented on MAPREDUCE-5756:
-----------------------------------------
The same problem does also exist in MRv1. I added the test case in
MAPREDUCE-5756.2.patch to MRv1's TestCombineFileInputFormat and the test failed
with java.io.FileNotFoundException: Path is not a file: /dir1/dir2.
bq. Note how it blindly just adds all the results of the second-level directory
listing to the results rather than recursing the directory handling logic.
Looking at that block of code, looks like it's adding the results for
first-level directory listing rather than the second-level. The _globStat_ in
the code block corresponds to the _fs.globStatus(p, inputFilter)_ call on _p_,
which is one of the first-level input directories. The list of paths from the
MRv1's mapreduce.lib.input FileInputFormat#listStatus call does include the
second-level directories, hence the problem also exists in MRv1.
> CombineFileInputFormat.getSplits() including directories in its results
> -----------------------------------------------------------------------
>
> Key: MAPREDUCE-5756
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Jason Dere
> Assignee: Jason Dere
> Fix For: 2.6.0
>
> Attachments: MAPREDUCE-5756.1.patch, MAPREDUCE-5756.2.patch
>
>
> Trying to track down HIVE-6401, where we see some "is not a file" errors
> because getSplits() is giving us directories. I believe the culprit is
> FileInputFormat.listStatus():
> {code}
> if (recursive && stat.isDirectory()) {
> addInputPathRecursively(result, fs, stat.getPath(),
> inputFilter);
> } else {
> result.add(stat);
> }
> {code}
> Which seems to be allowing directories to be added to the results if
> recursive is false. Is this meant to return directories? If not, I think it
> should look like this:
> {code}
> if (stat.isDirectory()) {
> if (recursive) {
> addInputPathRecursively(result, fs, stat.getPath(),
> inputFilter);
> }
> } else {
> result.add(stat);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)