[
https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591038#comment-13591038
]
Alejandro Abdelnur commented on MAPREDUCE-5038:
-----------------------------------------------
The backport looks OK. Still, there is something that worries me
{code}
+ @Override
+ protected boolean isSplitable(FileSystem fs, Path file) {
+ final CompressionCodec codec =
+ new CompressionCodecFactory(fs.getConf()).getCodec(file);
+ return codec == null;
+ }
{code}
We should take into account splittable codecs, trunk does take into account and
so does 0.22. I wonder where/why this got drop in Hadoop 1. Any idea?
> old API CombineFileInputFormat missing fixes that are in new API
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-5038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 1.1.1
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5038.patch
>
>
> The following changes patched the CombineFileInputFormat in mapreduce, but
> neglected the one in mapred
> MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files
> MAPREDUCE-2021 solved returning duplicate hostnames in split locations
> MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default
> FS
> In trunk this is not an issue as the one in mapred extends the one in
> mapreduce.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira