errose28 commented on PR #8755: URL: https://github.com/apache/ozone/pull/8755#issuecomment-3090840155
> Our main goal here is to build a dashboard in Recon to show storage usage distribution across the cluster. We should leave all dashboarding to Grafana instead of reimplementing the wheel in Recon. > Recon already uses StorageReport for other usage stats, so we thought it would be a good idea to extend it to include pending deletion info as well, keeping everything in one place StorageReport is not a catch-all to implement metrics collection in Recon, it is for reporting information about the disks within datanodes. In general Recon displays current categorical information (keys, containers, pipelines, volumes/disks) and metrics + dashboards track numeric information (bytes, durations, or event counts) over time. Deletion progress falls cleanly in the second category. > We considered using Prometheus metrics, but based on my understanding. In ozone services these values might reset from beginning and become inaccurate in case of service restarts. This can lead to wrong conclusions in the dashboard. Metrics collection is decoupled from persistence. Yes we will need to persist the counters in datanodes so they do not need to be recomputed on every restart, but this is independent of how the metrics are published and consumed by other services. > Also I think to avoid dependency on Prometheus service running in this case (most customers do not have it set up when they hit issues related to deletion or anything related to space reclamation). If any Ozone users want dashboards they will need Prometheus and Grafana. We do not have the bandwidth to build and maintain our own dashboarding setup when quality ones already exist. This should be a reasonably small change if it is done the way Ozone is designed to handle it: - Add metrics, with persistence underneath if necessary (this PR) - Add the metrics to a dashboard, like [this](https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/compose/common/grafana/dashboards/Ozone%20-%20DeleteKey%20Metrics.json) (follow-up PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
