[
https://issues.apache.org/jira/browse/HBASE-29135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Connell updated HBASE-29135:
------------------------------------
Attachment: create-decompression-stream-zstd.html
> ZStandard decompression can operate directly on ByteBuffs
> ---------------------------------------------------------
>
> Key: HBASE-29135
> URL: https://issues.apache.org/jira/browse/HBASE-29135
> Project: HBase
> Issue Type: Improvement
> Reporter: Charles Connell
> Assignee: Charles Connell
> Priority: Minor
> Attachments: create-decompression-stream-zstd.html
>
>
> I've been thinking about ways to improve HBase's performance when reading
> HFiles, and I believe there is significant opportunity. I look at many
> RegionServer profile flamegraphs of my company's servers. A pattern that I've
> discovered is that object allocation in a very hot code path is a performance
> killer. The HFile decoding code makes some effort to avoid this, but it isn't
> totally successful.
> Each time a block is decoded in HFileBlockDefaultDecodingContext, a new
> DecompressorStream is allocated and used. This is a lot of allocation, and
> the use of the streaming pattern requires copying every byte to be
> decompressed more times than necessary. Each byte is copied from a ByteBuff
> into a byte[], then decompressed, then copied back to a ByteBuff. For
> decompressors like org.apache.hadoop.hbase.io.compress.zstd.ZstdDecompressor
> that only operate on direct memory, two additional copies are introduced to
> move from a byte[] to a direct NIO ByteBuffer, then back to a byte[].
> Aside from the copies inherent in the decompression algorithm, the necessity
> of copying from an compressed buffer to an uncompressed buffer, all of these
> other copies can be avoided without sacrificing functionality. Along the way,
> we'll also avoid allocating objects.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)