d-c-manning commented on code in PR #5081:
URL: https://github.com/apache/hbase/pull/5081#discussion_r1125218280
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java:
##########
@@ -1123,12 +1127,26 @@ rpcControllerFactory, getMetaLookupPool(),
connectionConfig.getMetaReadRpcTimeou
}
}
- void takeUserRegionLock() throws IOException {
+ private long heldStartTime;
+
+ void takeUserRegionLock(int tries) throws IOException {
try {
long waitTime = connectionConfig.getMetaOperationTimeout();
+ long waitStartTime = 0;
+ if (metrics != null) {
+ metrics.updateUserRegionLockQueue(userRegionLock.getQueueLength());
+ waitStartTime = EnvironmentEdgeManager.currentTime();
+ }
if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {
+ if (metrics != null) {
+ metrics.incrUserRegionLockTimeout();
+ }
throw new LockTimeoutException("Failed to get user region lock in" +
waitTime + " ms. "
+ " for accessing meta region server.");
+ } else if (metrics != null) {
+ // successfully grabbed the lock, start timer of holding the lock
+ heldStartTime = EnvironmentEdgeManager.currentTime();
+ metrics.updateUserRegionLockWaiting(tries * waitTime + heldStartTime -
waitStartTime);
Review Comment:
if we are here on a retry, it doesn't mean we had to wait `waitTime * tries`
to get here. The first try could have happened without any time waiting for the
lock. And if we waited `waitTime`, we would have thrown `LockTimeoutException`
and we would not retry.
```suggestion
metrics.updateUserRegionLockWaiting(heldStartTime - waitStartTime);
```
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetricsConnection.java:
##########
@@ -443,6 +447,15 @@ protected Ratio getRatio() {
this.nsLookups = registry.counter(name(this.getClass(), NS_LOOKUPS,
scope));
this.nsLookupsFailed = registry.counter(name(this.getClass(),
NS_LOOKUPS_FAILED, scope));
+ this.userRegionLockTimeoutCount =
+ registry.counter(name(this.getClass(), "userRegionLockTimeoutCount",
scope));
+ this.userRegionLockWaitingTimer =
+ registry.timer(name(this.getClass(), "userRegionLockWaitingTimer",
scope));
+ this.userRegionLockHeldTimer =
+ registry.timer(name(this.getClass(), "userRegionLockHeldTimer", scope));
Review Comment:
nit: to match the names of the other timers, perhaps
```suggestion
registry.timer(name(this.getClass(), "userRegionLockWaitDurationMs",
scope));
this.userRegionLockHeldTimer =
registry.timer(name(this.getClass(), "userRegionLockHeldDurationMs",
scope));
```
(Although, I'm confused - is there anything that guarantees we get
Milliseconds when we read out the metric? Or is it in nanoseconds? Other timer
names in this file say `Ms` but now I have my doubts. 🤔 )
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java:
##########
@@ -1123,12 +1127,26 @@ rpcControllerFactory, getMetaLookupPool(),
connectionConfig.getMetaReadRpcTimeou
}
}
- void takeUserRegionLock() throws IOException {
+ private long heldStartTime;
Review Comment:
nit: this probably belongs at the top of the file with the other class
variables. Since it is a class variable, maybe we can give it a more
descriptive name to tie it to the lock
```suggestion
private long userRegionLockHeldStartTime;
```
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetricsConnection.java:
##########
@@ -598,6 +611,24 @@ public void incrementServerOverloadedBackoffTime(long
time, TimeUnit timeUnit) {
overloadedBackoffTimer.update(time, timeUnit);
}
+ /** incr */
Review Comment:
is there a chance for a test which tests these metrics? Something like what
exists in `TestMetaCache#testUserRegionLockThrowsException`?
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetricsConnection.java:
##########
@@ -443,6 +447,15 @@ protected Ratio getRatio() {
this.nsLookups = registry.counter(name(this.getClass(), NS_LOOKUPS,
scope));
this.nsLookupsFailed = registry.counter(name(this.getClass(),
NS_LOOKUPS_FAILED, scope));
+ this.userRegionLockTimeoutCount =
+ registry.counter(name(this.getClass(), "userRegionLockTimeoutCount",
scope));
+ this.userRegionLockWaitingTimer =
+ registry.timer(name(this.getClass(), "userRegionLockWaitingTimer",
scope));
+ this.userRegionLockHeldTimer =
+ registry.timer(name(this.getClass(), "userRegionLockHeldTimer", scope));
+ this.userRegionLockQueueHist =
+ registry.histogram(name(MetricsConnection.class,
"userRegionLockQueueHist", scope));
Review Comment:
consider not including `Hist` - since it will be appended with other
histogram related stuff - and call it length instead
```suggestion
registry.histogram(name(MetricsConnection.class,
"userRegionLockQueueLength", scope));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]