sodonnel commented on PR #3781: URL: https://github.com/apache/ozone/pull/3781#issuecomment-1294032163
Why use threadLocal variables in the latest patch? Does this even work? If another thread reads the values set by the monitor thread, does it even get the values, as they should be local to the thread who set them? Lets take a step back. The Decommission Monitor thread is synchronised already, and at the end of a run, the accumulated values for the metrics can be sent to the gauge, while still under the lock. In that way, a consistent set of metrics can be sent to the gauge. However they are being sent to the gauge over a number of calls, so the gauge could be snapshot halfway through, giving inconsistent results. Instead, you can make the snapshot method in the metrics class synchronized. Then in the sendMetricsToGauge method, you can wrap all the calls to the metrics class in a synchronized block too: ``` private synchronized void setMetricsToGauge() { synchronized(metrics) { metrics.setTrackedContainersUnhealthyTotal(unhealthyContainers.get()); metrics.setTrackedRecommissionNodesTotal(trackedRecommission.get()); metrics.setTrackedDecommissioningMaintenanceNodesTotal( trackedDecomMaintenance.get()); metrics.setTrackedContainersUnderReplicatedTotal( underReplicatedContainers.get()); metrics.setTrackedContainersSufficientlyReplicatedTotal( sufficientlyReplicatedContainers.get()); metrics.setTrackedPipelinesWaitingToCloseTotal(pipelinesWaitingToClose .get()); metrics.metricRecordOfContainerStateByHost(containerStateByHost.get()); } } ``` That way, we get a consistent set of metrics into the metric class, and block the `getMetrics(...)` method from executing when the metrics are in the process of being updated, preventing any race conditions. I think you should revert that latest commit, and the synchronised part as I suggested here, and then we can see if we can clean up the reset side of things (which I think you already did here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org