sodonnel commented on code in PR #3781: URL: https://github.com/apache/ozone/pull/3781#discussion_r1004976856
########## hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeAdminMonitorImpl.java: ########## @@ -168,6 +232,55 @@ public int getTrackedNodeCount() { return trackedNodes.size(); } + synchronized void setMetricsToGauge() { + metrics.setTrackedContainersUnhealthyTotal(unhealthyContainers); + metrics.setTrackedRecommissionNodesTotal(trackedRecommission); + metrics.setTrackedDecommissioningMaintenanceNodesTotal( + trackedDecomMaintenance); + metrics.setTrackedContainersUnderReplicatedTotal( + underReplicatedContainers); + metrics.setTrackedContainersSufficientlyReplicatedTotal( + sufficientlyReplicatedContainers); + metrics.setTrackedPipelinesWaitingToCloseTotal(pipelinesWaitingToClose); + for (Map.Entry<String, ContainerStateInWorkflow> e : + containerStateByHost.entrySet()) { + metrics.metricRecordOfContainerStateByHost(e.getKey(), + e.getValue().sufficientlyReplicated, + e.getValue().underReplicatedContainers, + e.getValue().unhealthyContainers, + e.getValue().pipelinesWaitingToClose); + } + } + + void resetContainerMetrics() { + pipelinesWaitingToClose = 0; + sufficientlyReplicatedContainers = 0; + unhealthyContainers = 0; + underReplicatedContainers = 0; + + for (Map.Entry<String, ContainerStateInWorkflow> e : Review Comment: Does the implementation as it stands now, protect against incomplete intermediate metrics? The metrics are snapshot via the call to getMetrics, but the metrics are set over several calls from the Decommission monitor to the metrics class and there is no synchronisation. Could we not set `trackedPipelinesWaitingToCloseTotal` and then before `trackedContainersUnderReplicatedTotal` is set, `getMetrics` is called giving an inconsistent result? Probably, getMetrics() and any setters need synchronized, and even then you need to set everything in a single synchronized call. We can build up a `Map<String, ContainerStateInWorkflow>` for each iteration and at the end of the iteration pass it to the metrics object like new `Map<>(mapJustBuiltUp)` - then you can clear the original and the Map and ContainerStateInWorkflow objects are referenced only in the new metrics class and will not be changed again. Replace them all on the next call and then we don't need to worry about expiring individual nodes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org