Adrian,
Yes, it is single OSD oriented.
Like Haomai, we monitor perf dumps from individual OSD admin sockets. On
new enough versions of ceph, you can do 'ceph daemon osd.x perf dump',
which is a shorter way to ask for the same output as 'ceph
--admin-daemon /var/run/ceph/ceph-osd.x.asok perf dump'. Keep in mind,
either version has to be run locally on the host where osd.x is running.
We use Sensu to take samples and push them to Graphite. We have the
ability to then build dashboards showing the whole cluster, units in our
CRUSH tree, hosts, or an individual OSDs.
I have found that monitoring each OSD's admin daemon is critical. Often
times a single OSD can affect performance of the entire cluster. Without
individual data, these types of issues can be quite difficult to pinpoint.
Also, note that Inktank has developed Calamari. There are rumors that it
may be open sourced at some point in the future.
Cheers,
Mike Dawson
On 5/13/2014 12:33 PM, Adrian Banasiak wrote:
Thanks for sugestion with admin daemon but it looks like single osd
oriented. I have used perf dump on mon socket and it output some
interesting data in case of monitoring whole cluster:
{ "cluster": { "num_mon": 4,
"num_mon_quorum": 4,
"num_osd": 29,
"num_osd_up": 29,
"num_osd_in": 29,
"osd_epoch": 1872,
"osd_kb": 20218112516,
"osd_kb_used": 5022202696,
"osd_kb_avail": 15195909820,
"num_pool": 4,
"num_pg": 3500,
"num_pg_active_clean": 3500,
"num_pg_active": 3500,
"num_pg_peering": 0,
"num_object": 400746,
"num_object_degraded": 0,
"num_object_unfound": 0,
"num_bytes": 1678788329609,
"num_mds_up": 0,
"num_mds_in": 0,
"num_mds_failed": 0,
"mds_epoch": 1},
Unfortunately cluster wide IO statistics are still missing.
2014-05-13 17:17 GMT+02:00 Haomai Wang <[email protected]
<mailto:[email protected]>>:
Not sure your demand.
I use "ceph --admin-daemon /var/run/ceph/ceph-osd.x.asok perf dump" to
get the monitor infos. And the result can be parsed by simplejson
easily via python.
On Tue, May 13, 2014 at 10:56 PM, Adrian Banasiak
<[email protected] <mailto:[email protected]>> wrote:
> Hi, i am working with test Ceph cluster and now I want to
implement Zabbix
> monitoring with items such as:
>
> - whoe cluster IO (for example ceph -s -> recovery io 143 MB/s, 35
> objects/s)
> - pg statistics
>
> I would like to create single script in python to retrive values
using rados
> python module, but there are only few informations in
documentation about
> module usage. I've created single function which calculates all pools
> current read/write statistics but i cant find out how to add
recovery IO
> usage and pg statistics:
>
> read = 0
> write = 0
> for pool in conn.list_pools():
> io = conn.open_ioctx(pool)
> stats[pool] = io.get_stats()
> read+=int(stats[pool]['num_rd'])
> write+=int(stats[pool]['num_wr'])
>
> Could someone share his knowledge about rados module for
retriving ceph
> statistics?
>
> BTW Ceph is awesome!
>
> --
> Best regards, Adrian Banasiak
> email: [email protected] <mailto:[email protected]>
>
> _______________________________________________
> ceph-users mailing list
> [email protected] <mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Best Regards,
Wheat
--
Pozdrawiam, Adrian Banasiak
email: [email protected] <mailto:[email protected]>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com