[ https://issues.apache.org/jira/browse/HADOOP-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251244#comment-13251244 ]
Tim Broberg commented on HADOOP-8148: ------------------------------------- Here are my thoughts from the distance of a month: 1 - Question: Do we define this as a new interface, or revise the existing one? IMO, there is probably too much code to switch over all at once. 2 - We also need to do something with the Compression(Input/Output)Stream. 2a - This would seem to be just an addition of ByteBufferReadable interface to the existing stream. Call it ZeroCopyCompressionInputStream? Too long? 2b - For this, I'm looking for the most common consumer of this code as well as the input stream interface. I'm thinking it's LineReader. The input stream passed to a compression stream would be read buffer to buffer without modification to LineReader. LineReader could take advantage of ZeroCopyCompressionInputStream by passing a direct buffer, or just use it as is and require a copy into his byte array. 2c - For symmetry, any reason not to define a ByteBufferWriteable interface and make the corresponding output stream class? 3 - The List<ByteBuffer> interface of HDFS-3051 doesn't seem to be taking off, and it complicates the native code. Kill it? Todd is suggesting adapting a codec, Snappy seems a likely candidate, to the new interface. I'll wait a week or so for any dust to settle and then generate a patch to trunk's Snappy codec for consideration. > Zero-copy ByteBuffer-based compressor / decompressor API > -------------------------------------------------------- > > Key: HADOOP-8148 > URL: https://issues.apache.org/jira/browse/HADOOP-8148 > Project: Hadoop Common > Issue Type: New Feature > Components: io > Reporter: Tim Broberg > Attachments: hadoop8148.patch > > > Per Todd Lipcon's comment in HDFS-2834, " > Whenever a native decompression codec is being used, ... we generally have > the following copies: > 1) Socket -> DirectByteBuffer (in SocketChannel implementation) > 2) DirectByteBuffer -> byte[] (in SocketInputStream) > 3) byte[] -> Native buffer (set up for decompression) > 4*) decompression to a different native buffer (not really a copy - > decompression necessarily rewrites) > 5) native buffer -> byte[] > with the proposed improvement we can hopefully eliminate #2,#3 for all > applications, and #2,#3,and #5 for libhdfs. > " > The interfaces in the attached patch attempt to address: > A - Compression and decompression based on ByteBuffers (HDFS-2834) > B - Zero-copy compression and decompression (HDFS-3051) > C - Provide the caller a way to know how the max space required to hold > compressed output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira