[ 
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847199#comment-16847199
 ] 

Zheng Hu commented on HBASE-22422:
----------------------------------

Data block reading failure will lead to an extra index-block release, that's to 
say: there will be a index block in LruBlockCache with refCnt=0, then all the 
following RPC requesting to this zero refCnt index-block will get a 
IllegalReferenceCountException, which make the QPS dropped from 25000/s to 
hunderds per second. 


Let me explain the detail, see the method 
HFileBlockIndex#loadDataBlockWithScanInfo: 

{code}
HFileBlock block = null;
boolean dataBlock = false;
KeyOnlyKeyValue tmpNextIndexKV = new KeyValue.KeyOnlyKeyValue();
while (true) {
try {
    //.....
    block =
        cachingBlockReader.readBlock(currentOffset, currentOnDiskSize, 
shouldCache, pread,
          isCompaction, true, expectedBlockType, expectedDataBlockEncoding);
    //.... Loop until we got a DataBlock; 
  }
} finally {
  if (!dataBlock && block != null) {
    // Release the block immediately if it is not the data block
    block.release();
  }
}
{code}

The first time in while loop, the block is a index block and read successfully 
from the LRuBlockCache; 
The second time in while loop,  need to read a data block in 
CombinedBLockcache, while read failure because of the above RAMCache concurrent 
issue. then an exception thrown when cachingBlockReader#readBlock. But the 
block variable still reference to a index block, then we did an extra release 
in the finally block.

> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> ------------------------------------------------------------
>
>                 Key: HBASE-22422
>                 URL: https://issues.apache.org/jira/browse/HBASE-22422
>             Project: HBase
>          Issue Type: Sub-task
>          Components: BlockCache
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>         Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, 
> 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch, 
> LRUBlockCache-getBlock.png, debug.patch, 
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster,  we found the get 
> QPS dropped from  25000/s to hunderds per second in a cluster with five 
> nodes.  
> After enable the debug log at YCSB client side,  I found the following 
> stacktrace , see 
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>  
> After looking into the stractrace, I can ensure that the zero refCnt block is 
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to