[
https://issues.apache.org/jira/browse/HBASE-27053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566488#comment-17566488
]
Bryan Beaudreault commented on HBASE-27053:
-------------------------------------------
My PR in [https://github.com/apache/hbase/pull/4610] is ready for review.
The solution ended up being a little more complicated than I indicated above:
* It wasn't enough to simply modify getBufferWithoutHeader() with
!isUnpacked(). isUnpacked() itself analyzes the buffer, so can't be used in a
method called by unpack().
* I tried abstracting things a bit so that an HFileBlock is stamped at
construction time with whether it includes a checksum or not. That was
complicated and broke various assumptions throughout the tests.
* In the end, the simplest solution was to discard the checksum right after
verifying it, in HFileBlock#readBlockDataInternal. This way we can be sure that
all HFileBlocks (packed or unpacked) do not have a checksum. This just overall
makes it easier to reason about, but required a bit more code to change and fix
some tests.
Another simple solution would be to modify
HFileBlockDefaultDecodingContext.prepareDecoding to copy over the checksum
after decompressing. This would be a more targeted change, but in my opinion
would be wasteful – we never use the checksum after reading from disk, so why
do the work to copy it around everywhere?
Open to other opinions though, happy to go with what seems most reasonable to
everyone.
> IOException during caching of uncompressed block to the block cache.
> --------------------------------------------------------------------
>
> Key: HBASE-27053
> URL: https://issues.apache.org/jira/browse/HBASE-27053
> Project: HBase
> Issue Type: Bug
> Components: BlockCache
> Affects Versions: 2.4.12
> Reporter: Sergey Soldatov
> Assignee: Bryan Beaudreault
> Priority: Major
>
> When prefetch to block cache is enabled and blocks are compressed sometimes
> caching fails with the exception:
> {noformat}
> 2022-05-18 21:37:29,597 ERROR [RS_OPEN_REGION-regionserver/x1:16020-2]
> regionserver.HRegion: Could not initialize all stores for the
> region=cluster_test,66666666,1652935047946.a57ca5f9e7bebb4855a44523063f79c7.
> 2022-05-18 21:37:29,598 WARN [RS_OPEN_REGION-regionserver/x1:16020-2]
> regionserver.HRegion: Failed initialize of region=
> cluster_test,66666666,1652935047946.a57ca5f9e7bebb4855a44523063f79c7.,
> starting to roll back memstore
> java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached
> block contents differ, which should not have
> happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1149)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1092)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:996)
> at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:946)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7240)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7199)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7175)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7090)
> at
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.lang.RuntimeException: Cached block
> contents differ, which should not have
> happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:294)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:344)
> at org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:294)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6375)
> at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1115)
> at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1112)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> Caused by: java.lang.RuntimeException: Cached block contents differ, which
> should not have
> happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
> at
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:199)
> at
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:231)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:447)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:432)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:418)
> at
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:60)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1319)
> at java.util.Optional.ifPresent(Optional.java:159)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1317)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readAndUpdateNewBlock(HFileReaderImpl.java:942)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:931)
> at
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:171)
> at
> org.apache.hadoop.hbase.io.HalfStoreFileReader.getFirstKey(HalfStoreFileReader.java:321)
> at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:477)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:490)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:231)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.lambda$openStoreFiles$0(StoreEngine.java:272)
> ... 6 more
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)