[
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847199#comment-16847199
]
Zheng Hu commented on HBASE-22422:
----------------------------------
Data block reading failure will lead to an extra index-block release, that's to
say: there will be a index block in LruBlockCache with refCnt=0, then all the
following RPC requesting to this zero refCnt index-block will get a
IllegalReferenceCountException, which make the QPS dropped from 25000/s to
hunderds per second.
Let me explain the detail, see the method
HFileBlockIndex#loadDataBlockWithScanInfo:
{code}
HFileBlock block = null;
boolean dataBlock = false;
KeyOnlyKeyValue tmpNextIndexKV = new KeyValue.KeyOnlyKeyValue();
while (true) {
try {
//.....
block =
cachingBlockReader.readBlock(currentOffset, currentOnDiskSize,
shouldCache, pread,
isCompaction, true, expectedBlockType, expectedDataBlockEncoding);
//.... Loop until we got a DataBlock;
}
} finally {
if (!dataBlock && block != null) {
// Release the block immediately if it is not the data block
block.release();
}
}
{code}
The first time in while loop, the block is a index block and read successfully
from the LRuBlockCache;
The second time in while loop, need to read a data block in
CombinedBLockcache, while read failure because of the above RAMCache concurrent
issue. then an exception thrown when cachingBlockReader#readBlock. But the
block variable still reference to a index block, then we did an extra release
in the finally block.
> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> ------------------------------------------------------------
>
> Key: HBASE-22422
> URL: https://issues.apache.org/jira/browse/HBASE-22422
> Project: HBase
> Issue Type: Sub-task
> Components: BlockCache
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch,
> 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch,
> LRUBlockCache-getBlock.png, debug.patch,
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get
> QPS dropped from 25000/s to hunderds per second in a cluster with five
> nodes.
> After enable the debug log at YCSB client side, I found the following
> stacktrace , see
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>
> After looking into the stractrace, I can ensure that the zero refCnt block is
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)