[ 
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847185#comment-16847185
 ] 

Zheng Hu commented on HBASE-22422:
----------------------------------

Understand now, it's a cnocurrent bug in RAMCache, say if thread1 try to 
getBlock as following: 
Step.1 :  get the block1 from RAMCache#delegate; 
Step.2 :  call the block1#retain to increase its refCnt; 
But another thread2 have flushed block into IOEngine and start clear the block 
from RAMCache: 
Step.a :  get the block1 by RAMCache#delegate.remove;
Step.b:   call the block1#release to decrease its refCnt. 

If those steps above ordered as following: 
Step.1 :  get the block1 from RAMCache#delegate; 
Step.a :  get the block1 by RAMCache#delegate.remove;
Step.b:   call the block1#release to decrease its refCnt, here the refCnt 
decrease from 1 to 0;
Step.2 :  call the block1#retain to increase its refCnt; 

Then, the concurrent bug will occur.  One way to fix this is : make the 
getAndRetain/removeAndRelease to be atomic.

 

> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> ------------------------------------------------------------
>
>                 Key: HBASE-22422
>                 URL: https://issues.apache.org/jira/browse/HBASE-22422
>             Project: HBase
>          Issue Type: Sub-task
>          Components: BlockCache
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>         Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, 
> 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch, 
> LRUBlockCache-getBlock.png, debug.patch, 
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster,  we found the get 
> QPS dropped from  25000/s to hunderds per second in a cluster with five 
> nodes.  
> After enable the debug log at YCSB client side,  I found the following 
> stacktrace , see 
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>  
> After looking into the stractrace, I can ensure that the zero refCnt block is 
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to