[ 
https://issues.apache.org/jira/browse/HBASE-29158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931793#comment-17931793
 ] 

Guanglei Xia edited comment on HBASE-29158 at 3/2/25 8:52 AM:
--------------------------------------------------------------

It can be seen that the exception occurred when querying chensumType during 
checksum. The chensumType we used should be fixed at 3 (CRC32C), but here the 
error checksumType is 52. We suspect that the HFile block header is corrupted, 
but the header caches the next bloack header and does not perform checksum 
validation on the next block header, so this block header may have problems.
{code:java}
id=1, table=xxxx, attempt=6/16, failureCount=22ops, last 
exception=java.io.IOException: java.io.IOException: Unknown checksum type code 
52
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:438)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
Caused by: java.lang.RuntimeException: Unknown checksum type code 52
        at 
org.apache.hadoop.hbase.util.ChecksumType.codeToType(ChecksumType.java:98)
        at 
org.apache.hadoop.hbase.io.hfile.ChecksumUtil.validateChecksum(ChecksumUtil.java:172)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.validateChecksum(HFileBlock.java:1906)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1816)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1571)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1519)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:342)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:845)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.reseekTo(HFileReaderImpl.java:826)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:335)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:244)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:457)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:373)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:315)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:279)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1097)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:157)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6682)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6846)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6616)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:6593)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:6580)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2645)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:861)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2785)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:379)
        ... 3 more
 on nodexxxx.hadoop,60020,1739332076563, tracking started null, retrying 
after=2013ms, operationsToReplay=22 {code}
 

 


was (Author: JIRAUSER308845):
It can be seen that the exception occurred when querying chensumType during 
checksum. The chensumType we used should be fixed at 3 (CRC32C), but here the 
error checksumType is 52. We suspect that the HFile block header is corrupted, 
but the header caches the next bloack header and does not perform checksum 
validation on the next block header, so this block header may have problems.

> Unknown checksum type code exception occurred while reading HFileBlock
> ----------------------------------------------------------------------
>
>                 Key: HBASE-29158
>                 URL: https://issues.apache.org/jira/browse/HBASE-29158
>             Project: HBase
>          Issue Type: Bug
>          Components: HFile
>    Affects Versions: 2.2.6
>            Reporter: Guanglei Xia
>            Priority: Major
>              Labels: pull-request-available
>
> In our HBase cluster, we encountered frequent checksum type error messages. 
> After reviewing the relevant Jira, we found that HBASE-28605 had previously 
> discussed the issue of HBase checksum. Currently, HBase checksum does not 
> check the hfile header cache, which can cause some problems when HFile is 
> corrupted. This patch(HBASE-28605) also fixes several cases of corrupt HFile. 
> However, HBASE-28605 cannot solve the problem of checksum type error when the 
> HFile header is corrupted. We propose a new patch to fix the issue of 
> checksum type error. We will check the checksum type value of the hfile 
> header before the checksum. If this is incorrect, it means that the hfile 
> header is corrupted and cannot be used anymore. Finally, this patch was 
> applied in our HBase cluster and the bug has been resolved in our cluster.
> We will provide feedback on this patch to the community and display the error 
> stack in the comments, hoping to receive some guidance......



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to