charlesconnell opened a new pull request, #6708:
URL: https://github.com/apache/hbase/pull/6708

   Each time a block is decoded in HFileBlockDefaultDecodingContext, a new 
DecompressorStream is allocated and used. This is a lot of allocation, and the 
use of the streaming pattern requires copying every byte to be decompressed 
more times than necessary. Each byte is copied from a ByteBuff into a `byte[]`, 
then decompressed, then copied back to a ByteBuff. For decompressors like 
`org.apache.hadoop.hbase.io.compress.zstd.ZstdDecompressor` that only operate 
on direct memory, two additional copies are introduced to move from a `byte[]` 
to a direct NIO ByteBuffer, then back to a `byte[]`.
   
   Aside from the copies inherent in the decompression algorithm, the necessity 
of copying from an compressed buffer to an uncompressed buffer, all of these 
other copies can be avoided without sacrificing functionality. Along the way, 
we'll also avoid allocating objects.
   
   In this PR:
   - Introduce the interface `ByteBuffDecompressor` which does exactly what it 
sounds like
   - Provide a `ZstdByteBuffDecompressor` that uses zstd-jni underneath
     - This works when the input and output arguments are both direct 
`SingleByteBuff`s or both heap `SingleByteBuff`s.
     - I have a plan to improve zstd-jni so we can handle other combinations in 
HBase in the future.
   - The CodecPool now pools `ByteBuffDecompressor`s the same way that it pools 
`Decompressor`s.
   - When decoding an HFile block, if the decompressor supports decompression 
directly on the `ByteBuff`s, then take the new fast path.
   
   In a subsequent PR I plan to add glue so that any codec offering a 
`org.apache.hadoop.io.compress.DirectDecompressor`, which several in 
hadoop-common already do, can be used as a `ByteBuffDecompressor`.
   
   I've already been using this code successfully in production at my company.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to