[ 
https://issues.apache.org/jira/browse/HBASE-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989470#comment-16989470
 ] 

Baiqiang Zhao commented on HBASE-23375:
---------------------------------------

The trigger may be:

(1) The top reference file will getFirstKey when opening, and cache miss, then 
read block from HFile and cache the block into BucketCache.

(2) It found BC already contains cacheKey in method cacheBlockWithWait(), and 
the existingBlock is in ramCache. It's possible that both the daughter regions 
load the same block from their parent HFile.

(3) So go to method shouldReplaceExistingCacheBlock(). At the same time, the 
existingBlock is added to writerQueue and remove from ramCache. So in 
shouldReplaceExistingCacheBlock() it will get null when get existingBlock from 
BC.

(4)Finally, throws a NPE, and RS going down. 

Anything can happen with multi-thread environment.

> NPE during opening a daughter region in cacheBlock 
> ---------------------------------------------------
>
>                 Key: HBASE-23375
>                 URL: https://issues.apache.org/jira/browse/HBASE-23375
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.6.0, 1.4.11
>            Reporter: Baiqiang Zhao
>            Priority: Major
>
> The RegionServer log is :
> {code:java}
> 2019-12-04 11:32:37,238 INFO  [regionserver/localhost/0.0.0.0:16020-splits-0] 
> regionserver.SplitRequest: Running rollback/cleanup of failed split of 
> ONLINE:testTable,\x00999999999\x0014aa9,1575406565984.48f462e65b7961420737797c2ccf76c9.;
>  Failed 
> localhost,16020,1574999150042-daughterOpener=aad203e7b1aa26a26b50c84f70397456
> java.io.IOException: Failed 
> localhost,16020,1574999150042-daughterOpener=aad203e7b1aa26a26b50c84f70397456
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughters(SplitTransactionImpl.java:504)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsAfterPONR(SplitTransactionImpl.java:598)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:581)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:153)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1041)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:916)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:884)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7098)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughterRegion(SplitTransactionImpl.java:732)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl$DaughterOpener.run(SplitTransactionImpl.java:712)
>        ... 1 more
> Caused by: java.io.IOException: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.openStoreFiles(HStore.java:577)
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:532)
>         at org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:281)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5469)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1015)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1012)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.compareCacheBlock(BlockCacheUtil.java:185)
>         at 
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:204)
>         at 
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:233)
>         at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:433)
>         at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:419)
>         at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:68)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:462)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:651)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:601)
>         at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:190)
>         at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader.getFirstKey(HalfStoreFileReader.java:365)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:546)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:563)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:553)
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:707)
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.access$000(HStore.java:122)
>         at org.apache.hadoop.hbase.regionserver.HStore$1.call(HStore.java:552)
>         at org.apache.hadoop.hbase.regionserver.HStore$1.call(HStore.java:549)
>         ... 6 more
> 2019-12-04 11:32:37,288 WARN  [regionserver/localhost/0.0.0.0:16020-splits-0] 
> regionserver.SplitTransaction: Should use rollback(Server, 
> RegionServerServices, User)
> 2019-12-04 11:32:37,294 FATAL [regionserver/localhost/0.0.0.0:16020-splits-0] 
> regionserver.HRegionServer: ABORTING region server 
> localhost,16020,1574999150042: Abort; we got an error after 
> point-of-no-return{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to