[ceph-users] Help with setting-up Influx MGR module: ERROR - queue is full

Fulvio Galeazzi Tue, 13 Feb 2024 07:04:04 -0800

Hi there!

Has anyone any experience with the Influx Ceph mgr module?

I am using 17.2.7 on CentOS8-Stream, I configured one of my clusters, I test with "ceph influx send" (whereas official doc https://docs.ceph.com/en/quincy/mgr/influx/ mentions the non-existing "ceph influx self-test") but nothing goes to the influx databases. Here is my config (password not shown):

mgr  advanced  mgr/influx/database  cephct                      *
mgr  advanced  mgr/influx/hostname  influxdb-dev.cloud.garr.it  *
mgr  advanced  mgr/influx/interval                    300       *
mgr  advanced  mgr/influx/password                    ****      *
mgr  advanced  mgr/influx/ssl                         false     *
mgr  advanced  mgr/influx/username                    cephctusr *
mgr  advanced  mgr/influx/verify_ssl                  false     *

After enabling the module, in the MGR/MON logs I see, after a while:

2024-02-13T09:06:41.283+0100 7f5be9fff700 0 [influx ERROR root] Queue is full, failed to add chunk


and "ceph health detail" shows:

WRN] MGR_INFLUX_QUEUE_FULL: Failed to chunk to InfluxDB Queue
    Queue is full. InfluxDB might be slow with processing data (edited)

(I searched a bit for "failed to chunk" but found nothing)

MGR coexist with MON, and I verified (by installing influxdb by hand) that from the MON the command influx -database cephct -username cephctusr -password '****' -host influxdb-dev.cloud.garr.it

indeed works.

Hmm, actually while making my tests, at some point something arrived to the influxDB server, but only for 5 minutes or so, yesterday morning: it is practically impossible for me now to reconstruct what the configuration was at the time... may be during one server reboot?

In any case, only the following measurements
    ceph_pg_summary_osd
    ceph_pg_summary_pool

were populated, and they do not contain terribly exciting metrics, only status of PGs for each pool and number of PG per OSD. I guess the interesting metrics reported in the documentation (latency, bytes, operations...) should end up into some other measurement.

I am not particularly fond of Influx, just seeking for "something"(Influx? Telegraf?) to store metrics and eventually plot to Grafana, to replace the current Zabbix-based solution. I experimented with Prometheus with some satisfaction, some time ago, although it requires a scraper which I'd be happy to avoid, especially given the point below. An additional constraint is that I have at least 3 distinct Ceph production clusters to monitor, so I'd need a way to differentiate them in a simple manner.

How are you dealing with these matters, namely storing configuration and metrics "somewhere"?


Thanks a lot! (for your patience in reading this, at least)

                        Fulvio


--
Fulvio Galeazzi
GARR-Net Department
tel.: +39-334-6533-250
skype: fgaleazzi70

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Help with setting-up Influx MGR module: ERROR - queue is full

Reply via email to