[
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847185#comment-16847185
]
Zheng Hu commented on HBASE-22422:
----------------------------------
Understand now, it's a cnocurrent bug in RAMCache, say if thread1 try to
getBlock as following:
Step.1 : get the block1 from RAMCache#delegate;
Step.2 : call the block1#retain to increase its refCnt;
But another thread2 have flushed block into IOEngine and start clear the block
from RAMCache:
Step.a : get the block1 by RAMCache#delegate.remove;
Step.b: call the block1#release to decrease its refCnt.
If those steps above ordered as following:
Step.1 : get the block1 from RAMCache#delegate;
Step.a : get the block1 by RAMCache#delegate.remove;
Step.b: call the block1#release to decrease its refCnt, here the refCnt
decrease from 1 to 0;
Step.2 : call the block1#retain to increase its refCnt;
Then, the concurrent bug will occur. One way to fix this is : make the
getAndRetain/removeAndRelease to be atomic.
> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> ------------------------------------------------------------
>
> Key: HBASE-22422
> URL: https://issues.apache.org/jira/browse/HBASE-22422
> Project: HBase
> Issue Type: Sub-task
> Components: BlockCache
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch,
> 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch,
> LRUBlockCache-getBlock.png, debug.patch,
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get
> QPS dropped from 25000/s to hunderds per second in a cluster with five
> nodes.
> After enable the debug log at YCSB client side, I found the following
> stacktrace , see
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>
> After looking into the stractrace, I can ensure that the zero refCnt block is
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)