Hi Chris, Assuming that the scrape period for prom is set to 1 minute, you could simply be racing against the scrape. Usually it's not a good idea to create range vectors with the same time range as the scrape period. Given that you're using irate(), you could increase that to [2m] or higher and still get the effect that you want, since it'll always pick the two most recent points within the range on which to perform the rate calculation.
Josh On Mon, Jul 28, 2025 at 5:40 AM Christopher James <christopher.jamesjr2...@gmail.com> wrote: > > Hi, > > We have encountered some issues regarding the RBD stats metrics. > > We have some Grafana panels that show the per-pool rate of changes in RBD > IOPS and Throughput. > > We use these queries for IOPS > round(sum(irate(ceph_rbd_write_ops[1m])) by (pool)) > round(sum(irate(ceph_rbd_read_ops[1m])) by (pool)) > > and these queries for throughput > round(sum(irate(ceph_rbd_write_bytes[1m])) by (pool)) > round(sum(irate(ceph_rbd_read_bytes[1m])) by (pool)) > > Now, the problem we encounter here is that at some points in time, there > are some odd outputs. > In a way that the output has a lot of spikes, and at many points in time, > there seems to be no change in the mentioned values, hence the irate()'s > output would become zero, and the next data point would show that the data > has somehow doubled from the last time. > > I investigated the mgr/prometheus module to see if it is related to the > cache mechanism, and it doesn't seem to be about that because the metrics > collection method is done in less than 10 seconds most of the time and > there is no "collecting data took more than ..." log. > > I also investigated the raw time series data and saw that the two > datapoints are exactly the same at the times that the irate() returns zero. > I have put some screenshots of the panels in this read-only Google Doc: > https://docs.google.com/document/d/1Rf0dl4qAWnOtG80BsxVY9cjljQqliG2JHxgzVxDjGo8/edit?usp=sharing > > It is somewhat odd, to be honest, that every image in every pool does not > change in IOPS or throughput. > > Is this the natural way the metrics are exposed? Should I not use irate()? > Has anybody else encountered this issue as well? > > Cheers, > Chris > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io