[jira] [Resolved] (HADOOP-6817) SequenceFile.Reader can't read gzip format compressed sequence file which produce by a mapreduce job without native compression library

Harsh J (JIRA) Tue, 10 Jul 2012 10:52:38 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Harsh J resolved HADOOP-6817.
-----------------------------

    Resolution: Duplicate

This is being addressed via HADOOP-8582.
                
> SequenceFile.Reader can't read gzip format compressed sequence file which 
> produce by a mapreduce job without native compression library
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6817
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6817
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.2
>         Environment: Cluster:CentOS 5,jdk1.6.0_20
> Client:Mac SnowLeopard,jdk1.6.0_20
>            Reporter: Wenjun Huang
>
> An hadoop job output a gzip compressed sequence file(whether record 
> compressed or block compressed).The client program use SequenceFile.Reader to 
> read this sequence file,when reading the client program shows the following 
> exceptions:
> 2090 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2091 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new 
> decompressor
> Exception in thread "main" java.io.EOFException
>       at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
>       at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
>       at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
>       at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>       at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
>       at 
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
>       at 
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
>       at 
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:170)
>       at 
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:180)
>       at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>       at 
> com.shiningware.intelligenceonline.taobao.mapreduce.HtmlContentSeqOutputView.main(HtmlContentSeqOutputView.java:28)
> I studied the code in org.apache.hadoop.io.SequenceFile.Reader.init method 
> and read:
>       // Initialize... *not* if this we are constructing a temporary Reader
>       if (!tempReader) {
>         valBuffer = new DataInputBuffer();
>         if (decompress) {
>           valDecompressor = CodecPool.getDecompressor(codec);
>           valInFilter = codec.createInputStream(valBuffer, valDecompressor);
>           valIn = new DataInputStream(valInFilter);
>         } else {
>           valIn = valBuffer;
>         }
> the problem seems to be caused by "valBuffer = new DataInputBuffer();" 
> ,because GzipCodec.createInputStream creates an instance of GzipInputStream 
> whose constructor creates an instance of ResetableGZIPInputStream class.When 
> ResetableGZIPInputStream's constructor calls it base class 
> java.util.zip.GZIPInputStream's constructor ,it trys to read the empty 
> "valBuffer = new DataInputBuffer();" and get no content,so it throws an 
> EOFException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HADOOP-6817) SequenceFile.Reader can't read gzip format compressed sequence file which produce by a mapreduce job without native compression library

Reply via email to