For our cluster we monitor write latency by running a short (10s) rados bench 
with one thread writing 64kB objects, every 5 minutes or so. rados bench tells 
you the min, max, and average of those writes -- we plot them all. An example 
is attached.

The latency and other metrics that we plot (including iops) are here in this 
sensor: 
https://github.com/cernceph/ceph-scripts/blob/master/cern-sls/ceph-sls.py    
Unfortunately it is not directly usable by others since it has been written for 
our local monitoring system.

Cheers, Dan






________________________________
From: [email protected] [[email protected]] on 
behalf of Jason Villalta [[email protected]]
Sent: 12 April 2014 16:41
To: Greg Poirier
Cc: [email protected]
Subject: Re: [ceph-users] Useful visualizations / metrics

I know ceph throws some warnings if there is high write latency.  But i would 
be most intrested in the delay for io requests, linking directly to iops.  If 
iops start to drop because the disk are overwhelmed then latency for requests 
would be increasing.  This would tell me that I need to add more OSDs/Nodes.  I 
am not sure there is a specific metric in ceph for this but it would be awesome 
if there was.


On Sat, Apr 12, 2014 at 10:37 AM, Greg Poirier 
<[email protected]<mailto:[email protected]>> wrote:
Curious as to how you define cluster latency.


On Sat, Apr 12, 2014 at 7:21 AM, Jason Villalta 
<[email protected]<mailto:[email protected]>> wrote:
Hi, i have not don't anything with metrics yet but the only ones I personally 
would be interested in is total capacity utilization and cluster latency.

Just my 2 cents.


On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier 
<[email protected]<mailto:[email protected]>> wrote:
I'm in the process of building a dashboard for our Ceph nodes. I was wondering 
if anyone out there had instrumented their OSD / MON clusters and found 
particularly useful visualizations.

At first, I was trying to do ridiculous things (like graphing % used for every 
disk in every OSD host), but I realized quickly that that is simply too many 
metrics and far too visually dense to be useful. I am attempting to put 
together a few simpler, more dense visualizations like... overcall cluster 
utilization, aggregate cpu and memory utilization per osd host, etc.

Just looking for some suggestions.  Thanks!

_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
--
Jason Villalta
Co-founder
[Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>




--
--
Jason Villalta
Co-founder
[Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>

<<inline: EmailLogo.png>>

<<attachment: rrdgraph.png>>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to