Re: [PR] [improve][broker] PIP-307: Add monitoring metrics for graceful closure of producers/consumers [pulsar]

via GitHub Thu, 04 Jan 2024 15:01:12 -0800


dragosvictor commented on code in PR #21854:
URL: https://github.com/apache/pulsar/pull/21854#discussion_r1442339444



##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/manager/UnloadManager.java:
##########
@@ -92,17 +153,35 @@ public void handleEvent(String serviceUnit, 
ServiceUnitStateData data, Throwable
             if (log.isDebugEnabled()) {
                 log.debug("Handling {} for service unit {} with exception.", 
data, serviceUnit, t);
             }
-            this.complete(serviceUnit, t);
+            complete(serviceUnit, t);
             return;
         }
+
+        if (log.isDebugEnabled()) {
+            log.debug("Handling {} for service unit {}", data, serviceUnit);
+        }
         ServiceUnitState state = ServiceUnitStateData.state(data);
         switch (state) {
-            case Free, Owned -> this.complete(serviceUnit, t);
-            default -> {
-                if (log.isDebugEnabled()) {
-                    log.debug("Handling {} for service unit {}", data, 
serviceUnit);
-                }
-            }
+            case Free, Owned -> complete(serviceUnit, t);
+            case Releasing -> recordReleaseLatency(serviceUnit, data);
+            case Assigning -> recordAssigningLatency(serviceUnit, data);
+        }
+    }
+
+    private void recordReleaseLatency(String serviceUnit, ServiceUnitStateData 
data) {
+        if (lookupServiceAddress.equals(data.sourceBroker())) {
+            releaseLatency.beginMeasurement(serviceUnit);
+            unloadLatency.beginMeasurement(serviceUnit);

Review Comment:
   My concern with measuring these at the leader level is that the values will 
be slightly larger than reality. Hence, keeping them closer to the origin would 
yield more accurate values.
   
   Aggregating summaries over multiple brokers is not possible. If I put them 
into a `Histogram`, I'd have to choose the buckets, however. The aggregation 
would happen on the Prometheus server side then. The benefits here are the 
faster recording of the metric on the broker, aggregation capability, as well 
as visualizing this data over larger time periods (the current implementation 
resets these values every "stats interval", which is 60 seconds by default). 
With that in mind, I'm leaning into your suggestion, switching over to 
`Histogram`, with buckets for 1 ms, 10 ms, 100 ms, 200 ms, 500 ms and 1000 ms. 
What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [improve][broker] PIP-307: Add monitoring metrics for graceful closure of producers/consumers [pulsar]

Reply via email to