[
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847172#comment-16847172
]
Zheng Hu commented on HBASE-22422:
----------------------------------
After running some hours, the bug reproduced in my pressure cluster, has the
following log:
{code}
2019-05-24,03:43:10,796 INFO org.apache.hadoop.hbase.nio.RefCnt: ===> Start to
dump callerSet for #641783987
2019-05-24,03:43:10,796 INFO org.apache.hadoop.hbase.nio.RefCnt: -->
#641783987 -> caller: HFileScannerImpl#returnBlocks: return curBlock, refCnt
before release is: 2
2019-05-24,03:43:10,796 INFO org.apache.hadoop.hbase.nio.RefCnt: -->
#641783987 -> caller: RAMCache#remove, refCnt before release is: 1
2019-05-24,03:43:10,796 INFO org.apache.hadoop.hbase.nio.RefCnt: ===> End to
dump callerSet #641783987
2019-05-24,03:43:10,801 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Encountered an unknown exception in RegionScannerImpl:
org.apache.hbase.thirdparty.io.netty.util.IllegalReferenceCountException:
refCnt: 0, increment: 1
at
org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.retain0(AbstractReferenceCounted.java:87)
at
org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.retain(AbstractReferenceCounted.java:74)
at org.apache.hadoop.hbase.nio.RefCnt.retain(RefCnt.java:73)
at
org.apache.hadoop.hbase.nio.SingleByteBuff.retain(SingleByteBuff.java:398)
at
org.apache.hadoop.hbase.nio.SingleByteBuff.retain(SingleByteBuff.java:39)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.retain(HFileBlock.java:457)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.retain(HFileBlock.java:115)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMCache.get(BucketCache.java:1539)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:483)
at
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.getCachedBlock(HFileReaderImpl.java:1306)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1472)
at
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:339)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:843)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:794)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:394)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:249)
at
org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2063)
at
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2054)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6493)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6473)
at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2999)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2979)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2961)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2955)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2621)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2548)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:374)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: ===> Start to
dump callerSet for #312566113
2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: -->
#312566113 -> caller: CellBasedKeyBlockIndexReader#loadDataBlockWithScanInfo,
refCnt before release is: 1
2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: -->
#312566113 -> caller: CellBasedKeyBlockIndexReader#loadDataBlockWithScanInfo,
refCnt before release is: 2
2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: -->
#312566113 -> caller: CellBasedKeyBlockIndexReader#loadDataBlockWithScanInfo,
refCnt before release is: 3
2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: ===> End to
dump callerSet #312566113
2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Encountered an unknown exception in RegionScannerImpl:
org.apache.hbase.thirdparty.io.netty.util.IllegalReferenceCountException:
refCnt: 0, increment: 1
at
org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.retain0(AbstractReferenceCounted.java:87)
at
org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.retain(AbstractReferenceCounted.java:74)
at org.apache.hadoop.hbase.nio.RefCnt.retain(RefCnt.java:73)
at
org.apache.hadoop.hbase.nio.SingleByteBuff.retain(SingleByteBuff.java:398)
at
org.apache.hadoop.hbase.nio.SingleByteBuff.retain(SingleByteBuff.java:39)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.retain(HFileBlock.java:457)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.retain(HFileBlock.java:115)
at
org.apache.hadoop.hbase.io.hfile.LruBlockCache.lambda$getBlock$0(LruBlockCache.java:512)
at
java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1769)
at
org.apache.hadoop.hbase.io.hfile.LruBlockCache.getBlock(LruBlockCache.java:507)
at
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:84)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.getCachedBlock(HFileReaderImpl.java:1306)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1472)
at
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:339)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:843)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:794)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:394)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:249)
at
org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2063)
at
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2054)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6493)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6473)
at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2999)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2979)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2961)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2955)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2621)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2548)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:374)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}
> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> ------------------------------------------------------------
>
> Key: HBASE-22422
> URL: https://issues.apache.org/jira/browse/HBASE-22422
> Project: HBase
> Issue Type: Sub-task
> Components: BlockCache
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch,
> 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch,
> LRUBlockCache-getBlock.png, debug.patch,
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get
> QPS dropped from 25000/s to hunderds per second in a cluster with five
> nodes.
> After enable the debug log at YCSB client side, I found the following
> stacktrace , see
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>
> After looking into the stractrace, I can ensure that the zero refCnt block is
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)