mwkang commented on code in PR #6900: URL: https://github.com/apache/hbase/pull/6900#discussion_r2051466419
########## hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java: ########## @@ -832,7 +845,7 @@ private void updateMetricsStore(boolean memstoreRead) { * @return null is the top cell doesn't change. Otherwise, the NextState to return */ private NextState needToReturn(List<? super ExtendedCell> outResult) { - if (!outResult.isEmpty() && topChanged) { + if (topChanged) { Review Comment: Thank you for checking and following up. You're right that the behavior you observed differs from what I described — the explanation I gave was based on the RegionScanner level. However, the unit test I provided was written at the StoreScanner level, and I should have clarified that distinction more clearly. Apologies for the confusion. To elaborate on what happens at the StoreScanner level: - **When a flush occurs and a topChanged event is triggered**, if the `!outResult.isEmpty()` condition is met, [the first `next()` call on the `StoreScanner`](https://github.com/apache/hbase/blob/d3869d757fbfc16c8775b9e13155a1a919cecaaa/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHStore.java#L1486) returns a result that already includes cells from **row2**. (For [row2, there are 3 Put operations on qf1, qf2, and qf3](https://github.com/apache/hbase/blob/d3869d757fbfc16c8775b9e13155a1a919cecaaa/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHStore.java#L1462-L1464), so 3 cells are returned for row2.) - Without that condition, the first `next()` call on the `StoreScanner` only returns **row1**. Since there are no remaining cells for **row1** at that point, the result is **empty** — which is still correct because `StoreScanner` correctly recognizes the row boundary and doesn't include cells from the next row (**row2**). Then, **row2** appears in the **second** call, as expected. This matters because the first call should return only data from **row1**. If **row2** is prematurely included due to the `!outResult.isEmpty()` check, the RegionScanner level ends up mishandling the row boundary — possibly splitting a single row into multiple Results. Also, if I comment out [`flushStore(store, id++);`](https://github.com/apache/hbase/blob/d3869d757fbfc16c8775b9e13155a1a919cecaaa/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHStore.java#L1476) (i.e., no topChanged occurs), then [**row2** appears on the second `next()` call](https://github.com/apache/hbase/blob/d3869d757fbfc16c8775b9e13155a1a919cecaaa/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHStore.java#L1491-L1495) regardless of the condition. This confirms that the problem is related to the topChanged caused by the flush, and that the `!outResult.isEmpty()` check interferes with correctly detecting the row change after that flush. **Whether a flush and a topChanged happens or not** — the result of each `next()` call on `StoreScanner` should be the same. Specifically, the first call should return an empty result for row1, and the second call should return the cells for row2. However, when a flush causes a `topChanged`, the presence of `!outResult.isEmpty()` alters the behavior: [**the first** `next()`](https://github.com/apache/hbase/blob/d3869d757fbfc16c8775b9e13155a1a919cecaaa/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHStore.java#L1486-L1487) **call on the** `StoreScanner` **wrongly includes row2**, breaking the expected consistency. Let me know if it would be helpful to provide some dummy code to demonstrate the behavior at the RegionScanner level and show the full impact more clearly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org