mwkang commented on code in PR #6900:
URL: https://github.com/apache/hbase/pull/6900#discussion_r2039655382


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:
##########
@@ -832,7 +845,7 @@ private void updateMetricsStore(boolean memstoreRead) {
    * @return null is the top cell doesn't change. Otherwise, the NextState to 
return
    */
   private NextState needToReturn(List<? super ExtendedCell> outResult) {
-    if (!outResult.isEmpty() && topChanged) {
+    if (topChanged) {

Review Comment:
   Let’s assume the data stored in HBase is as follows:
   
   (1) row0/family2:qf1/DeleteColumn
   (2) row0/family2:qf1/Put/value2
   
   (3) row1/family1:qf1/Put/value2  
   (4) row1/family2:qf1/Put/value2  
   
   Now, suppose a user starts scanning from **row0**.
   
   In 
[`RegionScannerImpl#nextInternal`](https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L415),
 when the [current 
cell](https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L446)’s
 row is `row0`, after reading entry (2) in `StoreScanner`, if a flush happens, 
a `topChanged` occurs ([`Storescanner.peek() is changed where before 
...`](https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L1074)),
 and the value of `StoreScanner`’s `heap.peek()` becomes (4) 
`row1/family2:qf1/Put/value2`.
   
   Since it is the next row, `StoreScanner` *should* return at that point — but 
it fails to recognize that it has moved to the next row because 
[outResult](https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L835)
 is empty, and ends up including the new row in the result.
   
   Then, in `RegionScannerImpl`, it sees that `nextKv`’s row is different from 
the current cell’s row, and returns (since it has moved to a different row).
   
   As a result, even though (3) and (4) belong to the same row (`row1`), they 
are returned to the client as if they were from different rows.
   
   (3) and (4) should be combined into a single 
[Result](https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Result.java),
 but they end up being returned as separate Result instances.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to