Hi there! Has anyone any experience with the Influx Ceph mgr module?
I am using 17.2.7 on CentOS8-Stream, I configured one of my clusters, I test with "ceph influx send" (whereas official doc https://docs.ceph.com/en/quincy/mgr/influx/ mentions the non-existing "ceph influx self-test") but nothing goes to the influx databases. Here is my config (password not shown):
mgr advanced mgr/influx/database cephct * mgr advanced mgr/influx/hostname influxdb-dev.cloud.garr.it * mgr advanced mgr/influx/interval 300 * mgr advanced mgr/influx/password **** * mgr advanced mgr/influx/ssl false * mgr advanced mgr/influx/username cephctusr * mgr advanced mgr/influx/verify_ssl false * After enabling the module, in the MGR/MON logs I see, after a while:2024-02-13T09:06:41.283+0100 7f5be9fff700 0 [influx ERROR root] Queue is full, failed to add chunk
and "ceph health detail" shows:
WRN] MGR_INFLUX_QUEUE_FULL: Failed to chunk to InfluxDB Queue
Queue is full. InfluxDB might be slow with processing data (edited)
(I searched a bit for "failed to chunk" but found nothing)
MGR coexist with MON, and I verified (by installing influxdb by hand)
that from the MON the command
influx -database cephct -username cephctusr -password '****' -host
influxdb-dev.cloud.garr.it
indeed works.Hmm, actually while making my tests, at some point something arrived to the influxDB server, but only for 5 minutes or so, yesterday morning: it is practically impossible for me now to reconstruct what the configuration was at the time... may be during one server reboot?
In any case, only the following measurements
ceph_pg_summary_osd
ceph_pg_summary_pool
were populated, and they do not contain terribly exciting metrics,
only status of PGs for each pool and number of PG per OSD. I guess the
interesting metrics reported in the documentation (latency, bytes,
operations...) should end up into some other measurement.
I am not particularly fond of Influx, just seeking for "something"(Influx? Telegraf?) to store metrics and eventually plot to Grafana, to replace the current Zabbix-based solution. I experimented with Prometheus with some satisfaction, some time ago, although it requires a scraper which I'd be happy to avoid, especially given the point below. An additional constraint is that I have at least 3 distinct Ceph production clusters to monitor, so I'd need a way to differentiate them in a simple manner.
How are you dealing with these matters, namely storing configuration and metrics "somewhere"?
Thanks a lot! (for your patience in reading this, at least)
Fulvio
--
Fulvio Galeazzi
GARR-Net Department
tel.: +39-334-6533-250
skype: fgaleazzi70
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
