[
https://issues.apache.org/jira/browse/HBASE-29907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-29907.
-------------------------------
Hadoop Flags: Reviewed
Resolution: Fixed
Pushed to all active branches.
Thanks [~jizening]!
> ROWCOL bloom filter + StoreScanner.trySkipToNextColumn can surface
> out-of-order cells, causing read failure “isDelete failed”
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-29907
> URL: https://issues.apache.org/jira/browse/HBASE-29907
> Project: HBase
> Issue Type: Bug
> Components: Filters, Scanners
> Affects Versions: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1, 2.5.7, 2.5.8, 2.6.1,
> 2.5.9, 2.5.10, 2.5.11, 2.6.2, 2.6.3, 2.5.12, 2.6.4, 2.5.13
> Reporter: Jize Ning
> Assignee: Jize Ning
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.5, 2.5.14
>
>
> h3. Summary
> We see intermittent read failures (Multi-column GET) when a column family
> uses ROWCOL bloom filters. Clients fail with an exception chain that includes:
> {code:java}
> 2026-02-17T07:49:24.041Z,
> RpcRetryingCaller{globalStartTime=2026-02-17T07:49:23.547Z, pause=250,
> maxAttempts=3}, java.io.IOException: java.io.IOException: isDelete failed:
> deleteBuffer=q15, qualifier=q09, timestamp=2010, comparison result: 1
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
> at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
> Caused by: java.lang.IllegalStateException: isDelete failed:
> deleteBuffer=q15, qualifier=q09, timestamp=2010, comparison result: 1
> at
> org.apache.hadoop.hbase.regionserver.querymatcher.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:132)
> at
> org.apache.hadoop.hbase.regionserver.querymatcher.ScanQueryMatcher.checkDeleted(ScanQueryMatcher.java:204)
> at
> org.apache.hadoop.hbase.regionserver.querymatcher.NormalUserScanQueryMatcher.match(NormalUserScanQueryMatcher.java:76)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:624)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
> at
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
> at
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
> at
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3402)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3668)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45006)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415) {code}
>
> h3. Suspected root cause
> This appears to be the same class of issue described in HBASE-19863, and
> seems to regress with the narrower guard introduced by HBASE-28055. When
> ROWCOL bloom indicates a row+qualifier is absent from a StoreFile, the
> scanner may use a bloom-optimized “fake key”. If such a fake key is consumed
> during the trySkipToNextColumn skip loop, a subsequent next() can advance
> from a stale physical HFile position and return a cell that sorts before the
> column being skipped. When that reaches delete-tracking (isDelete), the read
> can fail and surface as isDelete failed.
>
> the HBASE-28055 change *narrowed the safety check* in a way that is
> {*}semantically wrong{*}:
> * {*}HBASE-19863's check ({{{}compareKeyForNextColumn < 0{}}}){*}: Catches
> _any_ backward ordering of the next cell relative to the current cell's
> expected next column. This is a *general* guard — it handles any case where
> consuming cells in the loop caused the heap to surface a cell that violates
> ordering, regardless of whether it was a bloom fake key or something else.
> * {*}HBASE-28055's check ({{{}timestamp == OLDEST_TIMESTAMP{}}}){*}: Only
> catches the case where {{cell}} itself is a bloom filter fake key (since fake
> keys are created with {{{}OLDEST_TIMESTAMP{}}}). But the problem scenario
> described in HBASE-19863 is that the bloom fake key gets *consumed inside the
> loop* by {{{}heap.next(){}}}, and the _next_ real cell that surfaces is now
> out of order. In that case, {{cell}} (the trigger cell passed into
> {{{}trySkipToNextColumn{}}}) is a *real cell* with a real timestamp — _not_
> {{{}OLDEST_TIMESTAMP{}}}. The check misses it entirely
> h3. Impact / correctness concerns
> Beyond the immediate isDelete failed read failures, this indicates a deeper
> correctness issue: the scan pipeline relies on the invariant that
> KeyValueHeap delivers Cells in non-decreasing key order across all
> participating scanners. When ROWCOL bloom + trySkipToNextColumn can result in
> a “smaller key” being surfaced after a larger key (i.e., a backward jump),
> the heap’s ordering guarantee is effectively violated from the perspective of
> consumers (e.g., delete tracking and matchers).
>
> We were able to reproduce the bug with both a mini-cluster stress test and a
> small deterministic unit test. It would hit exception with the HBASE-28055
> fix but would not hit it with HBASE-19863 fix.
> *Proposed fix*
> We should revert the change in HBASE-28055. The claimed "performance
> improvement" comes from skipping the reseek that should be used to fix the
> heap ordering.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)