[
https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911671#comment-13911671
]
Jason Lowe commented on MAPREDUCE-5756:
---------------------------------------
Are you sure that's the relevant code change? Looking at the patch above, both
before and after the change it will recursively process directories. Am I
missing something? Also [~jdere] verified in [a
comment|https://issues.apache.org/jira/browse/MAPREDUCE-5756?focusedCommentId=13900772&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13900772]
that the FileInputFormat.listStatus behavior didn't change between 1.x and 2.x
with respect to directories.
Instead it appears to be caused by MAPREDUCE-4470 which changed the way
CombineFileInputFormat treats files without any blocks. Before it was failing
to generate any splits for empty files, and afterwards it looks like it
generates a degenerate split for them. Since directories also have no blocks,
I'm wondering if that change caused it to also generate a degenerate split for
directories as well as empty files.
> CombineFileInputFormat.getSplits() including directories in its results
> -----------------------------------------------------------------------
>
> Key: MAPREDUCE-5756
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Jason Dere
>
> Trying to track down HIVE-6401, where we see some "is not a file" errors
> because getSplits() is giving us directories. I believe the culprit is
> FileInputFormat.listStatus():
> {code}
> if (recursive && stat.isDirectory()) {
> addInputPathRecursively(result, fs, stat.getPath(),
> inputFilter);
> } else {
> result.add(stat);
> }
> {code}
> Which seems to be allowing directories to be added to the results if
> recursive is false. Is this meant to return directories? If not, I think it
> should look like this:
> {code}
> if (stat.isDirectory()) {
> if (recursive) {
> addInputPathRecursively(result, fs, stat.getPath(),
> inputFilter);
> }
> } else {
> result.add(stat);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)