haridsv commented on code in PR #7136:
URL: https://github.com/apache/hbase/pull/7136#discussion_r2184703931
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:
##########
@@ -1754,6 +1758,9 @@ protected HFileBlock
readBlockDataInternal(FSDataInputStream is, long offset,
headerBuf = HEAP.allocate(hdrSize);
readAtOffset(is, headerBuf, hdrSize, false, offset, pread);
headerBuf.rewind();
+ if (isScanMetricsEnabled) {
+ ThreadLocalServerSideScanMetrics.addBytesReadFromFs(hdrSize);
+ }
Review Comment:
Per the comment above, this block should get executed very rarely and when
that happens it would end up being costly. I wonder if we should actually have
a warning and monitor for such occurrences. This is of course unrelated to the
current PR, but starting a discussion anyway.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SegmentScanner.java:
##########
@@ -336,22 +337,40 @@ private Segment getSegment() {
*/
protected void updateCurrent() {
ExtendedCell next = null;
+ boolean isScanMetricsEnabled =
ThreadLocalServerSideScanMetrics.isScanMetricsEnabled();
+ int totalBytesRead = 0;
try {
while (iter.hasNext()) {
next = iter.next();
+ if (isScanMetricsEnabled) {
+ // Batch collect bytes to reduce method call overhead
+ totalBytesRead += Segment.getCellLength(next);
+ }
if (next.getSequenceId() <= this.readPoint) {
current = next;
+ // Add accumulated bytes before returning
+ if (isScanMetricsEnabled && totalBytesRead > 0) {
+
ThreadLocalServerSideScanMetrics.addBytesReadFromMemstore(totalBytesRead);
+ }
return;// skip irrelevant versions
}
// for backwardSeek() stay in the boundaries of a single row
if (stopSkippingKVsIfNextRow && segment.compareRows(next,
stopSkippingKVsRow) > 0) {
current = null;
+ // Add accumulated bytes before returning
+ if (isScanMetricsEnabled && totalBytesRead > 0) {
+
ThreadLocalServerSideScanMetrics.addBytesReadFromMemstore(totalBytesRead);
+ }
return;
}
} // end of while
current = null; // nothing found
+ // Add accumulated bytes at the end
+ if (isScanMetricsEnabled && totalBytesRead > 0) {
+
ThreadLocalServerSideScanMetrics.addBytesReadFromMemstore(totalBytesRead);
+ }
Review Comment:
The loop is continuing so might accumulate more, so adding here seems
incorrect especially that we are not reseting `totalBytesRead`, besides it will
get added again anyway when the loop ends, am I missing something?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java:
##########
@@ -1114,6 +1115,15 @@ private HFileBlock getCachedBlock(BlockCacheKey
cacheKey, boolean cacheBlock, bo
compressedBlock.release();
}
}
+ boolean isScanMetricsEnabled =
ThreadLocalServerSideScanMetrics.isScanMetricsEnabled();
+ if (isScanMetricsEnabled) {
+ int cachedBlockBytesRead = cachedBlock.getOnDiskSizeWithHeader();
+ // Account for the header size of the next block if it exists
+ if (cachedBlock.getNextBlockOnDiskSize() > 0) {
+ cachedBlockBytesRead += cachedBlock.headerSize();
+ }
+
ThreadLocalServerSideScanMetrics.addBytesReadFromBlockCache(cachedBlockBytesRead);
+ }
Review Comment:
A nit observation is that the block may get evicted if the block encoding
doesn't match expected encoding, which I think will happen in case the encoding
format gets changed since the time the blocks were cached. This should result
in the block getting read from the disk anyway, so we could avoid counting
these blocks.
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/metrics/ThreadLocalServerSideScanMetrics.java:
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.client.metrics;
+
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.yetus.audience.InterfaceAudience;
+
[email protected]
+public final class ThreadLocalServerSideScanMetrics {
+ private ThreadLocalServerSideScanMetrics() {
+ }
+
+ private static final ThreadLocal<Boolean> isScanMetricsEnabled = new
ThreadLocal<>() {
+ @Override
+ protected Boolean initialValue() {
+ return false;
+ }
+ };
+
+ private static final ThreadLocal<AtomicInteger> bytesReadFromFs = new
ThreadLocal<>() {
+ @Override
+ protected AtomicInteger initialValue() {
+ return new AtomicInteger(0);
+ }
+ };
+
+ private static final ThreadLocal<AtomicInteger> bytesReadFromBlockCache =
new ThreadLocal<>() {
+ @Override
+ protected AtomicInteger initialValue() {
+ return new AtomicInteger(0);
+ }
+ };
+
+ private static final ThreadLocal<AtomicInteger> bytesReadFromMemstore = new
ThreadLocal<>() {
+ @Override
+ protected AtomicInteger initialValue() {
+ return new AtomicInteger(0);
+ }
+ };
+
+ public static final void setScanMetricsEnabled(boolean enable) {
+ isScanMetricsEnabled.set(enable);
+ }
+
+ public static final int addBytesReadFromFs(int bytes) {
+ return bytesReadFromFs.get().addAndGet(bytes);
+ }
+
+ public static final int addBytesReadFromBlockCache(int bytes) {
+ return bytesReadFromBlockCache.get().addAndGet(bytes);
+ }
+
+ public static final int addBytesReadFromMemstore(int bytes) {
+ return bytesReadFromMemstore.get().addAndGet(bytes);
+ }
+
+ public static final boolean isScanMetricsEnabled() {
+ return isScanMetricsEnabled.get();
+ }
+
+ public static final AtomicInteger getBytesReadFromFsCounter() {
+ return bytesReadFromFs.get();
+ }
+
+ public static final AtomicInteger getBytesReadFromBlockCacheCounter() {
+ return bytesReadFromBlockCache.get();
+ }
+
+ public static final AtomicInteger getBytesReadFromMemstoreCounter() {
+ return bytesReadFromMemstore.get();
+ }
+
+ public static final int getBytesReadFromFsAndReset() {
+ return getBytesReadFromFsCounter().getAndSet(0);
+ }
+
+ public static final int getBytesReadFromBlockCacheAndReset() {
+ return getBytesReadFromBlockCacheCounter().getAndSet(0);
+ }
+
+ public static final int getBytesReadFromMemstoreAndReset() {
+ return getBytesReadFromMemstoreCounter().getAndSet(0);
+ }
+
+ public static final void reset() {
+ getBytesReadFromFsAndReset();
+ getBytesReadFromBlockCacheAndReset();
+ getBytesReadFromMemstoreAndReset();
Review Comment:
Are the threads used here coming from a pool? Do those threads also get used
for any other purpose? Just wondering if calling remove would be a better
option, in case they can lie around unused.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/ParallelSeekHandler.java:
##########
@@ -46,12 +51,25 @@ public ParallelSeekHandler(KeyValueScanner scanner,
ExtendedCell keyValue, long
this.keyValue = keyValue;
this.readPoint = readPoint;
this.latch = latch;
+ this.isScanMetricsEnabled =
ThreadLocalServerSideScanMetrics.isScanMetricsEnabled();
+ this.bytesReadFromFs =
ThreadLocalServerSideScanMetrics.getBytesReadFromFsCounter();
+ this.bytesReadFromBlockCache =
+ ThreadLocalServerSideScanMetrics.getBytesReadFromBlockCacheCounter();
}
@Override
public void process() {
try {
+
ThreadLocalServerSideScanMetrics.setScanMetricsEnabled(isScanMetricsEnabled);
+ if (isScanMetricsEnabled) {
+ ThreadLocalServerSideScanMetrics.reset();
+ }
scanner.seek(keyValue);
Review Comment:
What read activity does this exactly incur?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java:
##########
@@ -145,7 +150,18 @@ private static boolean hasNonce(HRegion region, long
nonce) {
} finally {
region.smallestReadPointCalcLock.unlock(ReadPointCalculationLock.LockType.RECORDING_LOCK);
}
+ boolean isScanMetricsEnabled = scan.isScanMetricsEnabled();
+
ThreadLocalServerSideScanMetrics.setScanMetricsEnabled(isScanMetricsEnabled);
+ if (isScanMetricsEnabled) {
+ ThreadLocalServerSideScanMetrics.reset();
+ }
initializeScanners(scan, additionalScanners);
+ if (isScanMetricsEnabled) {
+ bytesReadFromFs +=
ThreadLocalServerSideScanMetrics.getBytesReadFromFsAndReset();
Review Comment:
This corresponds to any bytes read for trailer and metadata, which will only
happen if the file is being read the first time or if the cached information is
ejected correct? This makes me wonder if this should be tracked separately
instead of adding to bytesReadFromFs as this is typically a one time thing.
However, if we track it separately, it can give insights into the occasional
spikes in latencies that are the result of these cache evictions.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SegmentScanner.java:
##########
@@ -336,22 +339,39 @@ private Segment getSegment() {
*/
protected void updateCurrent() {
ExtendedCell next = null;
+ int totalBytesRead = 0;
try {
while (iter.hasNext()) {
next = iter.next();
+ if (isScanMetricsEnabled) {
+ // Batch collect bytes to reduce method call overhead
+ totalBytesRead += Segment.getCellLength(next);
+ }
if (next.getSequenceId() <= this.readPoint) {
current = next;
+ // Add accumulated bytes before returning
+ if (isScanMetricsEnabled && totalBytesRead > 0) {
+
ThreadLocalServerSideScanMetrics.addBytesReadFromMemstore(totalBytesRead);
+ }
return;// skip irrelevant versions
}
// for backwardSeek() stay in the boundaries of a single row
if (stopSkippingKVsIfNextRow && segment.compareRows(next,
stopSkippingKVsRow) > 0) {
current = null;
+ // Add accumulated bytes before returning
+ if (isScanMetricsEnabled && totalBytesRead > 0) {
+
ThreadLocalServerSideScanMetrics.addBytesReadFromMemstore(totalBytesRead);
+ }
Review Comment:
Can't we move this to finally block and avoid repeating before both return
statements?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CompoundBloomFilter.java:
##########
@@ -120,7 +120,7 @@ private boolean containsInternal(byte[] key, int keyOffset,
int keyLength, ByteB
return result;
}
- private HFileBlock getBloomBlock(int block) {
+ public HFileBlock getBloomBlock(int block) {
Review Comment:
You are making this and others and a few others public for the sake of
testing? You can add:
```
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.UNITTEST)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]