d-c-manning commented on code in PR #5081:
URL: https://github.com/apache/hbase/pull/5081#discussion_r1125218280


##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java:
##########
@@ -1123,12 +1127,26 @@ rpcControllerFactory, getMetaLookupPool(), 
connectionConfig.getMetaReadRpcTimeou
     }
   }
 
-  void takeUserRegionLock() throws IOException {
+  private long heldStartTime;
+
+  void takeUserRegionLock(int tries) throws IOException {
     try {
       long waitTime = connectionConfig.getMetaOperationTimeout();
+      long waitStartTime = 0;
+      if (metrics != null) {
+        metrics.updateUserRegionLockQueue(userRegionLock.getQueueLength());
+        waitStartTime = EnvironmentEdgeManager.currentTime();
+      }
       if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {
+        if (metrics != null) {
+          metrics.incrUserRegionLockTimeout();
+        }
         throw new LockTimeoutException("Failed to get user region lock in" + 
waitTime + " ms. "
           + " for accessing meta region server.");
+      } else if (metrics != null) {
+        // successfully grabbed the lock, start timer of holding the lock
+        heldStartTime = EnvironmentEdgeManager.currentTime();
+        metrics.updateUserRegionLockWaiting(tries * waitTime + heldStartTime - 
waitStartTime);

Review Comment:
   if we are here on a retry, it doesn't mean we had to wait `waitTime * tries` 
to get here. The first try could have happened without any time waiting for the 
lock. And if we waited `waitTime`, we would have thrown `LockTimeoutException` 
and we would not retry.
   ```suggestion
           metrics.updateUserRegionLockWaiting(heldStartTime - waitStartTime);
   ```



##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetricsConnection.java:
##########
@@ -443,6 +447,15 @@ protected Ratio getRatio() {
     this.nsLookups = registry.counter(name(this.getClass(), NS_LOOKUPS, 
scope));
     this.nsLookupsFailed = registry.counter(name(this.getClass(), 
NS_LOOKUPS_FAILED, scope));
 
+    this.userRegionLockTimeoutCount =
+      registry.counter(name(this.getClass(), "userRegionLockTimeoutCount", 
scope));
+    this.userRegionLockWaitingTimer =
+      registry.timer(name(this.getClass(), "userRegionLockWaitingTimer", 
scope));
+    this.userRegionLockHeldTimer =
+      registry.timer(name(this.getClass(), "userRegionLockHeldTimer", scope));

Review Comment:
   nit: to match the names of the other timers, perhaps
   ```suggestion
         registry.timer(name(this.getClass(), "userRegionLockWaitDurationMs", 
scope));
       this.userRegionLockHeldTimer =
         registry.timer(name(this.getClass(), "userRegionLockHeldDurationMs", 
scope));
   ```
   
   (Although, I'm confused - is there anything that guarantees we get 
Milliseconds when we read out the metric? Or is it in nanoseconds? Other timer 
names in this file say `Ms` but now I have my doubts. 🤔 )



##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java:
##########
@@ -1123,12 +1127,26 @@ rpcControllerFactory, getMetaLookupPool(), 
connectionConfig.getMetaReadRpcTimeou
     }
   }
 
-  void takeUserRegionLock() throws IOException {
+  private long heldStartTime;

Review Comment:
   nit: this probably belongs at the top of the file with the other class 
variables. Since it is a class variable, maybe we can give it a more 
descriptive name to tie it to the lock
   ```suggestion
     private long userRegionLockHeldStartTime;
   ```



##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetricsConnection.java:
##########
@@ -598,6 +611,24 @@ public void incrementServerOverloadedBackoffTime(long 
time, TimeUnit timeUnit) {
     overloadedBackoffTimer.update(time, timeUnit);
   }
 
+  /** incr */

Review Comment:
   is there a chance for a test which tests these metrics? Something like what 
exists in `TestMetaCache#testUserRegionLockThrowsException`? 



##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetricsConnection.java:
##########
@@ -443,6 +447,15 @@ protected Ratio getRatio() {
     this.nsLookups = registry.counter(name(this.getClass(), NS_LOOKUPS, 
scope));
     this.nsLookupsFailed = registry.counter(name(this.getClass(), 
NS_LOOKUPS_FAILED, scope));
 
+    this.userRegionLockTimeoutCount =
+      registry.counter(name(this.getClass(), "userRegionLockTimeoutCount", 
scope));
+    this.userRegionLockWaitingTimer =
+      registry.timer(name(this.getClass(), "userRegionLockWaitingTimer", 
scope));
+    this.userRegionLockHeldTimer =
+      registry.timer(name(this.getClass(), "userRegionLockHeldTimer", scope));
+    this.userRegionLockQueueHist =
+      registry.histogram(name(MetricsConnection.class, 
"userRegionLockQueueHist", scope));

Review Comment:
   consider not including `Hist` - since it will be appended with other 
histogram related stuff - and call it length instead
   ```suggestion
         registry.histogram(name(MetricsConnection.class, 
"userRegionLockQueueLength", scope));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to