[
https://issues.apache.org/jira/browse/HBASE-29254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-29254.
-------------------------------
Hadoop Flags: Reviewed
Resolution: Fixed
> StoreScanner returns incorrect row after flush due to topChanged behavior
> -------------------------------------------------------------------------
>
> Key: HBASE-29254
> URL: https://issues.apache.org/jira/browse/HBASE-29254
> Project: HBase
> Issue Type: Bug
> Components: Scanners
> Reporter: Minwoo Kang
> Assignee: Minwoo Kang
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3, 2.5.12
>
>
> Let’s assume the data stored in HBase is as follows:
> (1) row0/family2:qf1/DeleteColumn
> (2) row0/family2:qf1/Put/value2
> (3) row1/family1:qf1/Put/value2
> (4) row1/family2:qf1/Put/value2
> Now, suppose a user starts scanning from {*}row0{*}.
> In
> [RegionScannerImpl#nextInternal|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L415],
> when the [current
> cell|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L446]’s
> row is row0, after reading entry (2) in StoreScanner, if a flush happens, a
> topChanged occurs (Storescanner.peek() is changed where before ...), and the
> value of StoreScanner’s heap.peek() becomes (4) row1/family2:qf1/Put/value2.
> Since it is the next row, StoreScanner should return at that point — but it
> fails to recognize that it has moved to the next row because
> [outResult|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L835]
> is empty, and ends up including the new row in the result.
> Then, in RegionScannerImpl, it sees that nextKv’s row is different from the
> current cell’s row, and returns (since it has moved to a different row).
> As a result, even though (3) and (4) belong to the same row (row1), they are
> returned to the client as if they were from different rows.
> (3) and (4) should be combined into a single
> [Result|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Result.java],
> but they end up being returned as separate Result instances.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)