For our cluster we monitor write latency by running a short (10s) rados bench with one thread writing 64kB objects, every 5 minutes or so. rados bench tells you the min, max, and average of those writes -- we plot them all. An example is attached.
The latency and other metrics that we plot (including iops) are here in this sensor: https://github.com/cernceph/ceph-scripts/blob/master/cern-sls/ceph-sls.py Unfortunately it is not directly usable by others since it has been written for our local monitoring system. Cheers, Dan ________________________________ From: [email protected] [[email protected]] on behalf of Jason Villalta [[email protected]] Sent: 12 April 2014 16:41 To: Greg Poirier Cc: [email protected] Subject: Re: [ceph-users] Useful visualizations / metrics I know ceph throws some warnings if there is high write latency. But i would be most intrested in the delay for io requests, linking directly to iops. If iops start to drop because the disk are overwhelmed then latency for requests would be increasing. This would tell me that I need to add more OSDs/Nodes. I am not sure there is a specific metric in ceph for this but it would be awesome if there was. On Sat, Apr 12, 2014 at 10:37 AM, Greg Poirier <[email protected]<mailto:[email protected]>> wrote: Curious as to how you define cluster latency. On Sat, Apr 12, 2014 at 7:21 AM, Jason Villalta <[email protected]<mailto:[email protected]>> wrote: Hi, i have not don't anything with metrics yet but the only ones I personally would be interested in is total capacity utilization and cluster latency. Just my 2 cents. On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier <[email protected]<mailto:[email protected]>> wrote: I'm in the process of building a dashboard for our Ceph nodes. I was wondering if anyone out there had instrumented their OSD / MON clusters and found particularly useful visualizations. At first, I was trying to do ridiculous things (like graphing % used for every disk in every OSD host), but I realized quickly that that is simply too many metrics and far too visually dense to be useful. I am attempting to put together a few simpler, more dense visualizations like... overcall cluster utilization, aggregate cpu and memory utilization per osd host, etc. Just looking for some suggestions. Thanks! _______________________________________________ ceph-users mailing list [email protected]<mailto:[email protected]> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- -- Jason Villalta Co-founder [Inline image 1] 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/> -- -- Jason Villalta Co-founder [Inline image 1] 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
<<inline: EmailLogo.png>>
<<attachment: rrdgraph.png>>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
