[ 
https://issues.apache.org/jira/browse/HBASE-29907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059258#comment-18059258
 ] 

Daniel Roudnitsky commented on HBASE-29907:
-------------------------------------------

This test creates multiple StoreFiles with ROWCOL bloom filters and small block 
sizes, then runs concurrent scans while a background thread continuously 
writes, flushes, and triggers minor compactions. The combination of bloom 
filter fake cell optimization, multiple StoreFiles, and concurrent flush/scan 
interleaving triggers the backward jump that the buggy check fails to catch.

Hoping to get some opinions on whether its appropriate to commit this style of 
longer running stress test which only validates that an exception does not get 
thrown in the course of the stress test

> ROWCOL bloom filter + StoreScanner.trySkipToNextColumn can surface 
> out-of-order cells, causing read failure “isDelete failed”
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29907
>                 URL: https://issues.apache.org/jira/browse/HBASE-29907
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters, Scanners
>    Affects Versions: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1, 2.5.7, 2.5.8, 2.6.1, 
> 2.5.9, 2.5.10, 2.5.11, 2.6.2, 2.6.3, 2.5.12, 2.6.4, 2.5.13
>            Reporter: Jize Ning
>            Priority: Major
>              Labels: pull-request-available
>
> h3. Summary
> We see intermittent read failures (Multi-column GET) when a column family 
> uses ROWCOL bloom filters. Clients fail with an exception chain that includes:
> {code:java}
> 2026-02-17T07:49:24.041Z, 
> RpcRetryingCaller{globalStartTime=2026-02-17T07:49:23.547Z, pause=250, 
> maxAttempts=3}, java.io.IOException: java.io.IOException: isDelete failed: 
> deleteBuffer=q15, qualifier=q09, timestamp=2010, comparison result: 1
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
> Caused by: java.lang.IllegalStateException: isDelete failed: 
> deleteBuffer=q15, qualifier=q09, timestamp=2010, comparison result: 1
>  at 
> org.apache.hadoop.hbase.regionserver.querymatcher.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:132)
>  at 
> org.apache.hadoop.hbase.regionserver.querymatcher.ScanQueryMatcher.checkDeleted(ScanQueryMatcher.java:204)
>  at 
> org.apache.hadoop.hbase.regionserver.querymatcher.NormalUserScanQueryMatcher.match(NormalUserScanQueryMatcher.java:76)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:624)
>  at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
>  at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
>  at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
>  at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3402)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3668)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45006)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415) {code}
>  
> h3. Suspected root cause
> This appears to be the same class of issue described in HBASE-19863, and 
> seems to regress with the narrower guard introduced by HBASE-28055. When 
> ROWCOL bloom indicates a row+qualifier is absent from a StoreFile, the 
> scanner may use a bloom-optimized “fake key”. If such a fake key is consumed 
> during the trySkipToNextColumn skip loop, a subsequent next() can advance 
> from a stale physical HFile position and return a cell that sorts before the 
> column being skipped. When that reaches delete-tracking (isDelete), the read 
> can fail and surface as isDelete failed.
>  
> the HBASE-28055 change *narrowed the safety check* in a way that is 
> {*}semantically wrong{*}:
>  * {*}HBASE-19863's check ({{{}compareKeyForNextColumn < 0{}}}){*}: Catches 
> _any_ backward ordering of the next cell relative to the current cell's 
> expected next column. This is a *general* guard — it handles any case where 
> consuming cells in the loop caused the heap to surface a cell that violates 
> ordering, regardless of whether it was a bloom fake key or something else.
>  * {*}HBASE-28055's check ({{{}timestamp == OLDEST_TIMESTAMP{}}}){*}: Only 
> catches the case where {{cell}} itself is a bloom filter fake key (since fake 
> keys are created with {{{}OLDEST_TIMESTAMP{}}}). But the problem scenario 
> described in HBASE-19863 is that the bloom fake key gets *consumed inside the 
> loop* by {{{}heap.next(){}}}, and the _next_ real cell that surfaces is now 
> out of order. In that case, {{cell}} (the trigger cell passed into 
> {{{}trySkipToNextColumn{}}}) is a *real cell* with a real timestamp — _not_ 
> {{{}OLDEST_TIMESTAMP{}}}. The check misses it entirely
> h3. Impact / correctness concerns
> Beyond the immediate isDelete failed read failures, this indicates a deeper 
> correctness issue: the scan pipeline relies on the invariant that 
> KeyValueHeap delivers Cells in non-decreasing key order across all 
> participating scanners. When ROWCOL bloom + trySkipToNextColumn can result in 
> a “smaller key” being surfaced after a larger key (i.e., a backward jump), 
> the heap’s ordering guarantee is effectively violated from the perspective of 
> consumers (e.g., delete tracking and matchers). 
>  
> We were able to reproduce the bug with both a mini-cluster stress test and a 
> small deterministic unit test. It would hit exception with the HBASE-28055 
> fix but would not hit it with HBASE-19863 fix. 
> *Proposed fix* 
> We should revert the change in HBASE-28055. The claimed "performance 
> improvement" comes from skipping the reseek that should be used to fix the 
> heap ordering.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to