Wellington Chevreuil created HBASE-27370:
--------------------------------------------
Summary: Avoid decompressing blocks when reading from bucket cache
prefetch threads
Key: HBASE-27370
URL: https://issues.apache.org/jira/browse/HBASE-27370
Project: HBase
Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil
When prefetching blocks into bucket cache, we had observed a consistent CPU
usage around 70% with no other workloads ongoing. For large bucket caches (i.e.
when using file based bucket cache), the prefetch can last for sometime and
having such a high CPU usage may impact the database usage by client
applications.
Further analysis of the prefetch threads stack trace showed that very often,
decompress logic is being executed by these threads:
{noformat}
"hfile-prefetch-1654895061122" #234 daemon prio=5 os_prio=0
tid=0x0000557bb2907000 nid=0x406d runnable [0x00007f294a504000]
java.lang.Thread.State: RUNNABLE
at
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
Method)
at
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000002d24c0ae8> (a java.io.BufferedInputStream)
at
org.apache.hadoop.hbase.io.util.BlockIOUtils.readFullyWithHeapBuffer(BlockIOUtils.java:105)
at
org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:465)
at
org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:90)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:650)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1342)
{noformat}
This is because *HFileReaderImpl.readBlock* is always decompressing blocks even
when *hbase.block.data.cachecompressed* is set to true.
This patch proposes an alternative flag to differentiate prefetch from normal
reads, so that doesn't decompress DATA blocks when prefetching with
*hbase.block.data.cachecompressed* set to true.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)