[jira] [Commented] (HADOOP-8148) Zero-copy ByteBuffer-based compressor / decompressor API

Tim Broberg (Commented) (JIRA) Tue, 10 Apr 2012 18:57:41 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251244#comment-13251244
 ]


Tim Broberg commented on HADOOP-8148:
-------------------------------------

Here are my thoughts from the distance of a month:
 1 - Question: Do we define this as a new interface, or revise the existing one?

IMO, there is probably too much code to switch over all at once.

 2 - We also need to do something with the Compression(Input/Output)Stream.

     2a - This would seem to be just an addition of ByteBufferReadable 
interface to the existing stream. Call it ZeroCopyCompressionInputStream? Too 
long?

     2b - For this, I'm looking for the most common consumer of this code as 
well as the input stream interface. I'm thinking it's LineReader. The input 
stream passed to a compression stream would be read buffer to buffer without 
modification to LineReader. LineReader could take advantage of 
ZeroCopyCompressionInputStream by passing a direct buffer, or just use it as is 
and require a copy into his byte array.

     2c - For symmetry, any reason not to define a ByteBufferWriteable 
interface and make the corresponding output stream class?

 3 - The List<ByteBuffer> interface of HDFS-3051 doesn't seem to be taking off, 
and it complicates the native code.

Kill it?

Todd is suggesting adapting a codec, Snappy seems a likely candidate, to the 
new interface.

I'll wait a week or so for any dust to settle and then generate a patch to 
trunk's Snappy codec for consideration.
                
> Zero-copy ByteBuffer-based compressor / decompressor API
> --------------------------------------------------------
>
>                 Key: HADOOP-8148
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8148
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>            Reporter: Tim Broberg
>         Attachments: hadoop8148.patch
>
>
> Per Todd Lipcon's comment in HDFS-2834, "
>   Whenever a native decompression codec is being used, ... we generally have 
> the following copies:
>   1) Socket -> DirectByteBuffer (in SocketChannel implementation)
>   2) DirectByteBuffer -> byte[] (in SocketInputStream)
>   3) byte[] -> Native buffer (set up for decompression)
>   4*) decompression to a different native buffer (not really a copy - 
> decompression necessarily rewrites)
>   5) native buffer -> byte[]
>   with the proposed improvement we can hopefully eliminate #2,#3 for all 
> applications, and #2,#3,and #5 for libhdfs.
> "
> The interfaces in the attached patch attempt to address:
>  A - Compression and decompression based on ByteBuffers (HDFS-2834)
>  B - Zero-copy compression and decompression (HDFS-3051)
>  C - Provide the caller a way to know how the max space required to hold 
> compressed output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8148) Zero-copy ByteBuffer-based compressor / decompressor API

Reply via email to