[ https://issues.apache.org/jira/browse/HADOOP-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651925#action_12651925 ]
Daehyun Kim commented on HADOOP-4652: ------------------------------------- I think the reason of this problem is that the Hudson system did not use the option of "-Dcompile.native=true". * -1 core tests. The patch failed core unit tests. ** [http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3616/testReport/junit.framework/TestSuite$1/warning/] ** [http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3616/testReport/org.apache.hadoop.mapred/TestRAGZIPInputFormat/testFormat/] > RAgzip: multiple map tasks for a large gzipped file > --------------------------------------------------- > > Key: HADOOP-4652 > URL: https://issues.apache.org/jira/browse/HADOOP-4652 > Project: Hadoop Core > Issue Type: Improvement > Components: io, mapred, native > Affects Versions: 0.20.0 > Reporter: Daehyun Kim > Attachments: HADOOP-4652.path > > > Currently, the hadoop processes gzipped files with only one map. > We have made a patch that enables multiple map tasks for one large gzipped > file. We call the patch RAgzip. > To process multiple map tasks for gzipped file, you may use RAgzip by just > changing InputFormat to RAGZIPInputFormat. > The option used in RAGZIPInputFormat can be found at the javadoc of > RAGZIPInputFormat part. > RAgzip uses zlib's inflatePrime function which supports random access on a > gzipped file. > Since the inflatePrime is supported from the version of 1.2.2.4, it requires > zlib 1.2.2.4 or higher. (We tested on zlib 1.2.3) > RAgzip requires the preprocessing step that creates an access point (.ap) > file, which is like the index of the gzipped file chunks. > The access point(.ap) file is located in same path of the gzipped file. > If there is a "/user/hadoop/test.gz", the .ap file is created with > "/user/hadoop/test.gz.ap". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.