Hi Paul,
Quoting Paul Emmerich ([email protected]):
> https://static.croit.io/ceph-training-examples/ceph-training-example-admin-socket.pdf
Thanks for the link. So, what tool do you use to gather the metrics? We
are using telegraf module of the Ceph manager. However, this module only
provides "sum" and not "avgtime" so I can't do the calculations. The
influx and zabbix mgr modules also only provide "sum". The only metrics
module that *does* send "avgtime" is the prometheus module:
ceph_mds_reply_latency_sum
ceph_mds_reply_latency_count
All modules use "self.get_all_perf_counters()" though:
~/git/ceph/src/pybind/mgr/ > grep -Ri get_all_perf_counters *
dashboard/controllers/perf_counters.py: return
mgr.get_all_perf_counters()
diskprediction_cloud/agent/metrics/ceph_mon_osd.py: perf_data =
obj_api.module.get_all_perf_counters(services=('mon', 'osd'))
influx/module.py: for daemon, counters in
six.iteritems(self.get_all_perf_counters()):
mgr_module.py: def get_all_perf_counters(self, prio_limit=PRIO_USEFUL,
prometheus/module.py: for daemon, counters in
self.get_all_perf_counters().items():
restful/api/perf.py: counters = context.instance.get_all_perf_counters()
telegraf/module.py: for daemon, counters in
six.iteritems(self.get_all_perf_counters())
Besides the *ceph* telegraf module we also use the ceph plugin for
telegraf ... but that plugin does not (yet?) provide mds metrics though.
Ideally we would *only* use the ceph mgr telegraf module to collect *all
the things*.
Not sure what's the difference in python code between the modules that could
explain this.
Gr. Stefan
--
| BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / [email protected]
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com