[ 
http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12445272 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Thanks for the detailed feedback to Owen/Sameer, i'll put up an updated patch 
asap... though I admit I hadn't thought about 32-bit jvm on a 64-bit OS! :)

Meanwhile, one of the nice side-effects of this patch will be to enable the 
GzipCodec to work with SequenceFiles. 

Context: gzip is just zlib algo + extra headers. 
java.util.zip.GZIP{Input|Output}Stream and hence existing GzipCodec won't work 
with SequenceFile due the fact that java.util.zip.GZIP{Input|Output}Streams 
will try to read/write gzip headers in the constructors which won't work in 
SequenceFiles since we typically read data from disk onto buffers, these 
buffers are empty on startup/after-reset and cause the 
java.util.zip.GZIP{Input|Output}Streams to fail.

The upshot of this patch is that newer (zlib-1.2.*) can deal with this directly 
(java.util.zip is zlib-1.1.*), which means we can use them in SequenceFile. 
However, the downside is that people will need to have native hadoop code for 
getting this benefit. If people strongly feel we need this funcationality 
without native hadoop code, IMHO not critical since gzip is zlib+headers i.e. 
exact compression etc., then I guess we can track it via a separate jira 
issue... would people object to me enabling GzipCodec to work with SequenceFile 
for now only with native code in? If the native code isn't present I can print 
out a warning very early and exit...

Thoughts?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve 
> performance of java.util.zip.{De|In}flater as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, 
> HADOOP-538_20061011.tgz, HADOOP-538_20061026.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater 
> classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) 
> using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in 
> alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to