[jira] [Commented] (HADOOP-8582) Improve error reporting for GZIP-compressed SequenceFiles with missing native libraries.

Harsh J (JIRA) Thu, 12 Jul 2012 11:00:57 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413007#comment-13413007
 ]


Harsh J commented on HADOOP-8582:
---------------------------------

Daryn,

Its sorta the latter. To be clearer, the reason is this, from HADOOP-538:

{quote}
Arun:

Context: gzip is just zlib algo + extra headers. 
java.util.zip.GZIP{Input|Output}Stream and hence existing GzipCodec won't work 
with SequenceFile due the fact that java.util.zip.GZIP{Input|Output}Streams 
will try to read/write gzip headers in the constructors which won't work in 
SequenceFiles since we typically read data from disk onto buffers, these 
buffers are empty on startup/after-reset and cause the 
java.util.zip.GZIP{Input|Output}Streams to fail.
{quote}
                
> Improve error reporting for GZIP-compressed SequenceFiles with missing native 
> libraries.
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8582
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8582
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 2.0.0-alpha
>            Reporter: Paul Wilkinson
>            Priority: Minor
>         Attachments: HADOOP-8582-1.diff
>
>
> At present it is not possible to write or read block-compressed SequenceFiles 
> using the GZIP codec without the native libraries being available.
> The SequenceFile.Writer code checks for the availability of native libraries 
> and throws a useful exception, but the SequenceFile.Reader doesn't do the 
> same:
> {noformat}
> Exception in thread "main" java.io.EOFException
>       at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:249)
>       at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:239)
>       at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:142)
>       at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>       at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:67)
>       at 
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:95)
>       at 
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:104)
>       at 
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:173)
>       at 
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:183)
>       at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1591)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1493)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1480)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
>       at test.SequenceReader.read(SequenceReader.java:23)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8582) Improve error reporting for GZIP-compressed SequenceFiles with missing native libraries.

Reply via email to