mwkang commented on code in PR #6900: URL: https://github.com/apache/hbase/pull/6900#discussion_r2053794175
########## hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java: ########## @@ -832,7 +845,7 @@ private void updateMetricsStore(boolean memstoreRead) { * @return null is the top cell doesn't change. Otherwise, the NextState to return */ private NextState needToReturn(List<? super ExtendedCell> outResult) { - if (!outResult.isEmpty() && topChanged) { + if (topChanged) { Review Comment: Sorry, I didn’t fully understand the question. If there is no data for `cf2` in `row1`, loading of `row2` is prevented by [`moreCellsInRow` in `RegionScannerImpl`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L348). This is because: - currentRowCell: `row1/cf1:cq1/1745317264666/Put/vlen=10/seqid=4` - nextKv: `row2/cf1:cq1/1745317264674/Put/vlen=10/seqid=5` By "What if there are no data for row1 in cf2? What is the current way to prevent loading row2?", are you asking about the current mechanism that prevents row2 from being included in the result while the StoreScanner is reading row1 and `topChanged` has not occurred? This is handled in [`ScanQueryMatcher.preCheck` using the `rowComparator`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L179-L181). ``` next, StoreScanner (org.apache.hadoop.hbase.regionserver) match, NormalUserScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher) preCheck, ScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher) ``` When the `currentRow` is `row1/cf1:cq1/1745314001315/DeleteColumn/vlen=0/seqid=5` and the `cell` is `row2/cf1:cq1/1745314001320/Put/vlen=10/seqid=7`, the MatchCode becomes DONE. However, when `topChanged` occurs (which is the problematic condition), the [`currentRow`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L131) becomes `row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7` and the cell is `row2/cf1:cq1/1745314001320/Put/vlen=10/seqid=7`. In [`ScanQueryMatcher.setToNewRow`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L266-L270), the `currentRow` is set to `row1/cf2:cq1/1745316126562/DeleteColumn/vlen=0/seqid=6`. ``` next, StoreScanner (org.apache.hadoop.hbase.regionserver) seekOrSkipToNextColumn, StoreScanner (org.apache.hadoop.hbase.regionserver) seekAsDirection, StoreScanner (org.apache.hadoop.hbase.regionserver) reseek, StoreScanner (org.apache.hadoop.hbase.regionserver) reopenAfterFlush, StoreScanner (org.apache.hadoop.hbase.regionserver) resetQueryMatcher, StoreScanner (org.apache.hadoop.hbase.regionserver) setToNewRow, ScanQueryMatcher (org.apache.hadoop.hbase.regionserver.querymatcher) ``` In this condition, "I mean we have delete row1 in cf1, and row2 in cf1 and cf2, how can we make sure that we do not return row2 when calling cf2.next at the first time?", it is determined as a different row by ScanQueryMatcher.preCheck. When using a `currentRow` that hasn't been flushed to disk yet, it may change after a flush when the scanner is reopened (`topChanged`). If `topChanged` occurs but there is an `outResult`, there is no issue. However, if there is no `outResult`, the return won't happen and the loop will iterate again, causing a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org