[ 
https://issues.apache.org/jira/browse/HBASE-29907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-29907.
-------------------------------
    Hadoop Flags: Reviewed
      Resolution: Fixed

Pushed to all active branches.

Thanks [~jizening]!

> ROWCOL bloom filter + StoreScanner.trySkipToNextColumn can surface 
> out-of-order cells, causing read failure “isDelete failed”
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29907
>                 URL: https://issues.apache.org/jira/browse/HBASE-29907
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters, Scanners
>    Affects Versions: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1, 2.5.7, 2.5.8, 2.6.1, 
> 2.5.9, 2.5.10, 2.5.11, 2.6.2, 2.6.3, 2.5.12, 2.6.4, 2.5.13
>            Reporter: Jize Ning
>            Assignee: Jize Ning
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.5, 2.5.14
>
>
> h3. Summary
> We see intermittent read failures (Multi-column GET) when a column family 
> uses ROWCOL bloom filters. Clients fail with an exception chain that includes:
> {code:java}
> 2026-02-17T07:49:24.041Z, 
> RpcRetryingCaller{globalStartTime=2026-02-17T07:49:23.547Z, pause=250, 
> maxAttempts=3}, java.io.IOException: java.io.IOException: isDelete failed: 
> deleteBuffer=q15, qualifier=q09, timestamp=2010, comparison result: 1
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
> Caused by: java.lang.IllegalStateException: isDelete failed: 
> deleteBuffer=q15, qualifier=q09, timestamp=2010, comparison result: 1
>  at 
> org.apache.hadoop.hbase.regionserver.querymatcher.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:132)
>  at 
> org.apache.hadoop.hbase.regionserver.querymatcher.ScanQueryMatcher.checkDeleted(ScanQueryMatcher.java:204)
>  at 
> org.apache.hadoop.hbase.regionserver.querymatcher.NormalUserScanQueryMatcher.match(NormalUserScanQueryMatcher.java:76)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:624)
>  at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
>  at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
>  at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
>  at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3402)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3668)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45006)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415) {code}
>  
> h3. Suspected root cause
> This appears to be the same class of issue described in HBASE-19863, and 
> seems to regress with the narrower guard introduced by HBASE-28055. When 
> ROWCOL bloom indicates a row+qualifier is absent from a StoreFile, the 
> scanner may use a bloom-optimized “fake key”. If such a fake key is consumed 
> during the trySkipToNextColumn skip loop, a subsequent next() can advance 
> from a stale physical HFile position and return a cell that sorts before the 
> column being skipped. When that reaches delete-tracking (isDelete), the read 
> can fail and surface as isDelete failed.
>  
> the HBASE-28055 change *narrowed the safety check* in a way that is 
> {*}semantically wrong{*}:
>  * {*}HBASE-19863's check ({{{}compareKeyForNextColumn < 0{}}}){*}: Catches 
> _any_ backward ordering of the next cell relative to the current cell's 
> expected next column. This is a *general* guard — it handles any case where 
> consuming cells in the loop caused the heap to surface a cell that violates 
> ordering, regardless of whether it was a bloom fake key or something else.
>  * {*}HBASE-28055's check ({{{}timestamp == OLDEST_TIMESTAMP{}}}){*}: Only 
> catches the case where {{cell}} itself is a bloom filter fake key (since fake 
> keys are created with {{{}OLDEST_TIMESTAMP{}}}). But the problem scenario 
> described in HBASE-19863 is that the bloom fake key gets *consumed inside the 
> loop* by {{{}heap.next(){}}}, and the _next_ real cell that surfaces is now 
> out of order. In that case, {{cell}} (the trigger cell passed into 
> {{{}trySkipToNextColumn{}}}) is a *real cell* with a real timestamp — _not_ 
> {{{}OLDEST_TIMESTAMP{}}}. The check misses it entirely
> h3. Impact / correctness concerns
> Beyond the immediate isDelete failed read failures, this indicates a deeper 
> correctness issue: the scan pipeline relies on the invariant that 
> KeyValueHeap delivers Cells in non-decreasing key order across all 
> participating scanners. When ROWCOL bloom + trySkipToNextColumn can result in 
> a “smaller key” being surfaced after a larger key (i.e., a backward jump), 
> the heap’s ordering guarantee is effectively violated from the perspective of 
> consumers (e.g., delete tracking and matchers). 
>  
> We were able to reproduce the bug with both a mini-cluster stress test and a 
> small deterministic unit test. It would hit exception with the HBASE-28055 
> fix but would not hit it with HBASE-19863 fix. 
> *Proposed fix* 
> We should revert the change in HBASE-28055. The claimed "performance 
> improvement" comes from skipping the reseek that should be used to fix the 
> heap ordering.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to