bbeaudreault commented on code in PR #4967: URL: https://github.com/apache/hbase/pull/4967#discussion_r1073484237
########## hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java: ########## @@ -612,6 +612,9 @@ public boolean next(List<Cell> outResult, ScannerContext scannerContext) throws // here, we still need to scan all the qualifiers before returning... scannerContext.returnImmediately(); } + + scannerContext.incrementBlockProgress(heap.getCurrentBlockSizeOnce()); Review Comment: Correct. KeyValueHeap makes this tricky, this is why the API design is strange. In the context of a single `StoreScanner.next` call, we can jump back and forth between different blocks in different StoreFiles. This approach here ensures that we always only count each block once. I considered a few different approaches, and I felt this one was the simplest: 1. Similar to RSRpcServices.addSize, try tracking blocks here by checking `heap.getCurrentBlock() != lastBlock`. As mentioned above, this is error prone (as the comments in addSize mention as well). 2. Try adding an `HFileReaderImpl.setBlockChangedCallback(newBlock -> scannerContext.incrementBlockSize(newBlock))`. This would work, but opens up issue with how StoreScanner can re-open StoreFileScanners due to flushes or read type switches, etc. We'd need to update this callback in RegionScannerImpl.nextInternal, which would need to delegate out to all StoreScanners -> KeyValueHeaps -> StoreFileScanners. 3. Add a `HFileReaderImpl.getBlockChanged()` and `HFileReader.getCurrentBlockSize()`. Then we could change this line to: ```java if (heap.getBlockChanged()) { scannerContext.incrementBlockProgress(heap.getCurrentBlockSize()); } ``` The implementation of HFileReaderImpl.getBlockChanged would be: ```java public boolean getBlockChanged() { if (blockChanged) { blockChanged = false; return true; } return false; } public int getCurrentBlockSize() { return curBlock.getUncompressedSizeWithoutHeader(); } ``` We'd reset blockChanged to true whenever we update curBlock in HFileReaderImpl. Option 3 would probably be the closest to what I have and most accurate/simplest, and it might be more intuitive API as well. The only downside would be 2 new interface methods instead of 1. Would you like me to do Option 3? ########## hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java: ########## @@ -612,6 +612,9 @@ public boolean next(List<Cell> outResult, ScannerContext scannerContext) throws // here, we still need to scan all the qualifiers before returning... scannerContext.returnImmediately(); } + + scannerContext.incrementBlockProgress(heap.getCurrentBlockSizeOnce()); Review Comment: Correct. KeyValueHeap makes this tricky, this is why the API design is strange. In the context of a single `StoreScanner.next` call, we can jump back and forth between different blocks in different StoreFiles. This approach here ensures that we always only count each block once. I considered a few different approaches, and I felt this one was the simplest: 1. Similar to RSRpcServices.addSize, try tracking blocks here by checking `heap.getCurrentBlock() != lastBlock`. As mentioned above, this is error prone (as the comments in addSize mention as well). 2. Try adding an `HFileReaderImpl.setBlockChangedCallback(newBlock -> scannerContext.incrementBlockSize(newBlock))`. This would work, but opens up issue with how StoreScanner can re-open StoreFileScanners due to flushes or read type switches, etc. We'd need to update this callback in RegionScannerImpl.nextInternal, which would need to delegate out to all StoreScanners -> KeyValueHeaps -> StoreFileScanners. 3. Add a `HFileReaderImpl.getBlockChanged()` and `HFileReader.getCurrentBlockSize()`. Then we could change this line to: ```java if (heap.getBlockChanged()) { scannerContext.incrementBlockProgress(heap.getCurrentBlockSize()); } ``` The implementation of HFileReaderImpl.getBlockChanged would be: ```java public boolean getBlockChanged() { if (blockChanged) { blockChanged = false; return true; } return false; } public int getCurrentBlockSize() { return curBlock.getUncompressedSizeWithoutHeader(); } ``` We'd reset blockChanged to true whenever we update curBlock in HFileReaderImpl. Option 3 would probably be the closest to what I have and most accurate/simplest, and it might be more intuitive API as well. The only downside would be 2 new interface methods instead of 1. Would you like me to do Option 3? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org