priyeshkaratha commented on PR #10564: URL: https://github.com/apache/ozone/pull/10564#issuecomment-4764847203
> > Fixes a datanode metrics lifecycle issue where VolumeInfoMetrics remained registered after failVolume(), which could keep triggering MethodMetric -> getCommitted and flood logs with NPEs in failed-volume scenarios. > > I believe this has been fixed in [ec2634d](https://github.com/apache/ozone/commit/ec2634d8d25bc8163c7c48fa869fc8bd584f0a6d). > > Can you please explain how `HddsVolume.committedBytes` can be `null` for failed volume in current code? You are right. committedBytes won't be null in the code. My idea is to unregister VolumeInfoMetrics The VolumeInfoMetrics source stays in the metrics registry after failure. The timer keeps calling registry.snapshot() then getCommitted(), getContainers(), getVolumeState(), etc. on a failed volume forever, until shutdown() is eventually called. Today those methods are safe (they return values from AtomicLong, ConcurrentSkipListSet, or enum state). But the flooding mechanism is still structurally active. Any future change that makes one of those @Metric methods throw on a failed volume will immediately produce the exact log flood pattern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
