[
https://issues.apache.org/jira/browse/HADOOP-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716199#action_12716199
]
Hadoop QA commented on HADOOP-4012:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12409554/Hadoop-4012-version9.patch
against trunk revision 781602.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 Eclipse classpath. The patch retains Eclipse classpath integrity.
-1 release audit. The applied patch generated 500 release audit warnings
(more than the trunk's current 492 warnings).
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/459/testReport/
Release audit warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/459/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/459/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/459/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/459/console
This message is automatically generated.
> Providing splitting support for bzip2 compressed files
> ------------------------------------------------------
>
> Key: HADOOP-4012
> URL: https://issues.apache.org/jira/browse/HADOOP-4012
> Project: Hadoop Core
> Issue Type: New Feature
> Components: io
> Affects Versions: 0.21.0
> Reporter: Abdul Qadeer
> Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: Hadoop-4012-version1.patch, Hadoop-4012-version2.patch,
> Hadoop-4012-version3.patch, Hadoop-4012-version4.patch,
> Hadoop-4012-version5.patch, Hadoop-4012-version6.patch,
> Hadoop-4012-version7.patch, Hadoop-4012-version8.patch,
> Hadoop-4012-version9.patch
>
>
> Hadoop assumes that if the input data is compressed, it can not be split
> (mainly due to the limitation of many codecs that they need the whole input
> stream to decompress successfully). So in such a case, Hadoop prepares only
> one split per compressed file, where the lower split limit is at 0 while the
> upper limit is the end of the file. The consequence of this decision is
> that, one compress file goes to a single mapper. Although it circumvents the
> limitation of codecs (as mentioned above) but reduces the parallelism
> substantially, as it was possible otherwise in case of splitting.
> BZip2 is a compression / De-Compression algorithm which does compression on
> blocks of data and later these compressed blocks can be decompressed
> independent of each other. This is indeed an opportunity that instead of one
> BZip2 compressed file going to one mapper, we can process chunks of file in
> parallel. The correctness criteria of such a processing is that for a bzip2
> compressed file, each compressed block should be processed by only one mapper
> and ultimately all the blocks of the file should be processed. (By
> processing we mean the actual utilization of that un-compressed data (coming
> out of the codecs) in a mapper).
> We are writing the code to implement this suggested functionality. Although
> we have used bzip2 as an example, but we have tried to extend Hadoop's
> compression interfaces so that any other codecs with the same capability as
> that of bzip2, could easily use the splitting support. The details of these
> changes will be posted when we submit the code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.