[ https://issues.apache.org/jira/browse/HADOOP-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413007#comment-13413007 ]
Harsh J commented on HADOOP-8582: --------------------------------- Daryn, Its sorta the latter. To be clearer, the reason is this, from HADOOP-538: {quote} Arun: Context: gzip is just zlib algo + extra headers. java.util.zip.GZIP{Input|Output}Stream and hence existing GzipCodec won't work with SequenceFile due the fact that java.util.zip.GZIP{Input|Output}Streams will try to read/write gzip headers in the constructors which won't work in SequenceFiles since we typically read data from disk onto buffers, these buffers are empty on startup/after-reset and cause the java.util.zip.GZIP{Input|Output}Streams to fail. {quote} > Improve error reporting for GZIP-compressed SequenceFiles with missing native > libraries. > ---------------------------------------------------------------------------------------- > > Key: HADOOP-8582 > URL: https://issues.apache.org/jira/browse/HADOOP-8582 > Project: Hadoop Common > Issue Type: Improvement > Components: io > Affects Versions: 2.0.0-alpha > Reporter: Paul Wilkinson > Priority: Minor > Attachments: HADOOP-8582-1.diff > > > At present it is not possible to write or read block-compressed SequenceFiles > using the GZIP codec without the native libraries being available. > The SequenceFile.Writer code checks for the availability of native libraries > and throws a useful exception, but the SequenceFile.Reader doesn't do the > same: > {noformat} > Exception in thread "main" java.io.EOFException > at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:249) > at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:239) > at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:142) > at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58) > at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:67) > at > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:95) > at > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:104) > at > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:173) > at > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:183) > at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1591) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1493) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1480) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475) > at test.SequenceReader.read(SequenceReader.java:23) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira