[ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12445272 ] Arun C Murthy commented on HADOOP-538: --------------------------------------
Thanks for the detailed feedback to Owen/Sameer, i'll put up an updated patch asap... though I admit I hadn't thought about 32-bit jvm on a 64-bit OS! :) Meanwhile, one of the nice side-effects of this patch will be to enable the GzipCodec to work with SequenceFiles. Context: gzip is just zlib algo + extra headers. java.util.zip.GZIP{Input|Output}Stream and hence existing GzipCodec won't work with SequenceFile due the fact that java.util.zip.GZIP{Input|Output}Streams will try to read/write gzip headers in the constructors which won't work in SequenceFiles since we typically read data from disk onto buffers, these buffers are empty on startup/after-reset and cause the java.util.zip.GZIP{Input|Output}Streams to fail. The upshot of this patch is that newer (zlib-1.2.*) can deal with this directly (java.util.zip is zlib-1.1.*), which means we can use them in SequenceFile. However, the downside is that people will need to have native hadoop code for getting this benefit. If people strongly feel we need this funcationality without native hadoop code, IMHO not critical since gzip is zlib+headers i.e. exact compression etc., then I guess we can track it via a separate jira issue... would people object to me enabling GzipCodec to work with SequenceFile for now only with native code in? If the native code isn't present I can print out a warning very early and exit... Thoughts? > Implement a nio's 'direct buffer' based wrapper over zlib to improve > performance of java.util.zip.{De|In}flater as a 'custom codec' > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-538 > URL: http://issues.apache.org/jira/browse/HADOOP-538 > Project: Hadoop > Issue Type: Improvement > Affects Versions: 0.6.1 > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.8.0 > > Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, > HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_benchmarks.tgz > > > There has been more than one instance where java.util.zip's {De|In}flater > classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) > using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in > alleviating these woes. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira