bbeaudreault commented on code in PR #4967:
URL: https://github.com/apache/hbase/pull/4967#discussion_r1073484237


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:
##########
@@ -612,6 +612,9 @@ public boolean next(List<Cell> outResult, ScannerContext 
scannerContext) throws
           // here, we still need to scan all the qualifiers before returning...
           scannerContext.returnImmediately();
         }
+
+        scannerContext.incrementBlockProgress(heap.getCurrentBlockSizeOnce());

Review Comment:
   Correct. KeyValueHeap makes this tricky, this is why the API design is 
strange.  In the context of a single `StoreScanner.next` call, we can jump back 
and forth between different blocks in different StoreFiles. This approach here 
ensures that we always only count each block once.
   
   I considered a few different approaches, and I felt this one was the 
simplest:
   
   1. Similar to RSRpcServices.addSize, try tracking blocks here by checking 
`heap.getCurrentBlock() != lastBlock`. As mentioned above, this is error prone 
(as the comments in addSize mention as well).
   2. Try adding an `HFileReaderImpl.setBlockChangedCallback(newBlock -> 
scannerContext.incrementBlockSize(newBlock))`.  This would work, but opens up 
issue with how StoreScanner can re-open StoreFileScanners due to flushes or 
read type switches, etc. We'd need to update this callback in 
RegionScannerImpl.nextInternal, which would need to delegate out to all 
StoreScanners -> KeyValueHeaps -> StoreFileScanners.
   3. Add a `HFileReaderImpl.getBlockChanged()` and 
`HFileReader.getCurrentBlockSize()`. Then we could change this line to:
   
   ```java
   if (heap.getBlockChanged()) {
     scannerContext.incrementBlockProgress(heap.getCurrentBlockSize());
   }
   ```
   
   The implementation of HFileReaderImpl.getBlockChanged would be:
   
   ```java
   public boolean getBlockChanged() {
     if (blockChanged) {
       blockChanged = false;
       return true;
     }
     return false;
   }
   ```
   
   We'd reset blockChanged to true whenever we update curBlock in 
HFileReaderImpl.
   
   Option 3 would probably be the closest to what I have and most 
accurate/simplest, and it might be more intuitive API as well. The only 
downside would be 2 new interface methods instead of 1.
   
   Would you like me to do Option 3?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to