[GitHub] [ozone] sodonnel commented on a diff in pull request #3781: HDDS-2642. Expose decommission / maintenance metrics via JMX

GitBox Tue, 25 Oct 2022 14:23:31 -0700


sodonnel commented on code in PR #3781:
URL: https://github.com/apache/ozone/pull/3781#discussion_r1004976856



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java:
##########
@@ -168,6 +232,55 @@ public int getTrackedNodeCount() {
     return trackedNodes.size();
   }
 
+  synchronized void setMetricsToGauge() {
+    metrics.setTrackedContainersUnhealthyTotal(unhealthyContainers);
+    metrics.setTrackedRecommissionNodesTotal(trackedRecommission);
+    metrics.setTrackedDecommissioningMaintenanceNodesTotal(
+            trackedDecomMaintenance);
+    metrics.setTrackedContainersUnderReplicatedTotal(
+            underReplicatedContainers);
+    metrics.setTrackedContainersSufficientlyReplicatedTotal(
+            sufficientlyReplicatedContainers);
+    metrics.setTrackedPipelinesWaitingToCloseTotal(pipelinesWaitingToClose);
+    for (Map.Entry<String, ContainerStateInWorkflow> e :
+            containerStateByHost.entrySet()) {
+      metrics.metricRecordOfContainerStateByHost(e.getKey(),
+          e.getValue().sufficientlyReplicated,
+          e.getValue().underReplicatedContainers,
+          e.getValue().unhealthyContainers,
+          e.getValue().pipelinesWaitingToClose);
+    }
+  }
+
+  void resetContainerMetrics() {
+    pipelinesWaitingToClose = 0;
+    sufficientlyReplicatedContainers = 0;
+    unhealthyContainers = 0;
+    underReplicatedContainers = 0;
+
+    for (Map.Entry<String, ContainerStateInWorkflow> e :

Review Comment:
   Does the implementation as it stands now, protect against incomplete 
intermediate metrics? The metrics are snapshot via the call to getMetrics, but 
the metrics are set over several calls from the Decommission monitor to the 
metrics class and there is no synchronisation. Could we not set 
`trackedPipelinesWaitingToCloseTotal` and then before 
`trackedContainersUnderReplicatedTotal` is set, `getMetrics` is called giving 
an inconsistent result?
   
   Probably, getMetrics() and any setters need synchronized, and even then you 
need to set everything in a single synchronized call.
   
   We can build up a `Map<String, ContainerStateInWorkflow>` for each iteration 
and at the end of the iteration pass it to the metrics object like new 
`Map<>(mapJustBuiltUp)` - then you can clear the original and the Map and 
ContainerStateInWorkflow objects are referenced only in the new metrics class 
and will not be changed again.
   
   Replace them all on the next call and then we don't need to worry about 
expiring individual nodes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

[GitHub] [ozone] sodonnel commented on a diff in pull request #3781: HDDS-2642. Expose decommission / maintenance metrics via JMX

Reply via email to