[ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12437447 ] Arun C Murthy commented on HADOOP-538: --------------------------------------
While working on this I've realised that the 'custom compressor' framework we have built as a part of HADOOP-441 isn't the most flexible one or complete. Specifically the existing framework only lets us plug-in custom compress/decompress 'streams' (e.g. a bzip2 input/output stream) while in many cases it is sufficient to use an existing 'stream' and just plug-in a custom deflater/inflater (e.g. native-zlib or lzo inflater/deflater pair)... java.util.zip's {De|In}flater classes just haven't been designed with this kind of functionality in mind; making them unsuitable. Hence I would like to propose that we add a org.apache.hadoop.io.compress.{Com|Decom}pressor interface which custom {de}compressors can implement and plug into an existing {de}compression stream... further I would also like to propose that the above {Com|Decom}pressor interfaces have the same interfaces as the public methods in java.util.zip.{De|In}flater ie. public interface Compressor { public void setInput(byte[] b, int off, int len); public boolean needsInput(); public void setDictionary(byte[] b, int off, int len); public void finish(); public boolean finished(); public int deflate(ByteBuffer directBuffer, int directBufferLength); // for native calls with nio's direct buffer public int deflate(byte[] b, int off, int len); // for native methods without nio public void reset(); public void end(); } public interface Decompressor { public void setInput(byte[] b, int off, int len); public boolean needsInput(); public void setDictionary(byte[] b, int off, int len); public boolean needsDictionary(); public void finish(); public boolean finished(); public int inflate(ByteBuffer directBuffer, int directBufferLength); // for native calls with nio's direct buffer public int inflate(byte[] b, int off, int len); // for native methods without nio public void reset(); public void end(); } On the same trajectory we will need to supply a pair of input/output streams which can take objects implementing the above interfaces to achieve actual compression/decompression. Again java.util.zip.{De|In}flater{Out|In}put streams won't suffice since they weren't designed with these in mind. I would like to propose org.apache.hadoop.io.compress.Compression{In|Out}putStreams, but they are already taken; how about org.apache.hadoop.io.compress.DataCompression{In|Out}putStreams? With existing Compression{Out|In}putStreams and the above {Com|Decom}pressor/DataCompression{In|Out}putStreams we should have a sufficiently complete abstractions to support 'custom codecs'... Thoughts? > Implement a nio's 'direct buffer' based wrapper over zlib to improve > performance of java.util.zip.{De|In}flater as a 'custom codec' > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-538 > URL: http://issues.apache.org/jira/browse/HADOOP-538 > Project: Hadoop > Issue Type: Improvement > Affects Versions: 0.6.1 > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.7.0 > > > There has been more than one instance where java.util.zip's {De|In}flater > classes perform unreliably, a simple wrapper over zlib-1.2.3 (latest stable) > using java.nio.ByteBuffer (i.e. direct buffers) should go a long way in > alleviating these woes. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira