[ceph-users] Re: Manager carries wrong information until killing it

Nico Schottelius Wed, 12 May 2021 14:13:23 -0700

Reed Dier <[email protected]> writes:

> I don't have a solution to offer, but I've seen this for years with no 
> solution.
> Any time a MGR bounces, be it for upgrades, or a new daemon coming online, 
> etc, I'll see a scale spike like is reported below.

Interesting to read that we are not the only ones.

> Just out of curiosity, which MGR plugins are you using?

[22:11:05] black2.place6:~# ceph mgr module ls
{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator_cli",
        "progress",
        "rbd_support",
        "status",
        "volumes"
    ],
    "enabled_modules": [
        "iostat",
        "pg_autoscaler",
        "prometheus",
        "restful"
    ],

> I have historically used the influx plugin for stats exports, and it shows up 
> in those values as well, throwing everything off.

So the problem is unlikely related to the prometheus plugin, but more to
a statistics error somewhere else.

> I don't see it in my Zabbix stats, albeit those are scraped at a
> longer interval that may not catch this.

For prometheus, we scrape every 10 or 15 seconds. But I wonder if this
really flattens out or whether the logic is actually different.

Out of curiosity from my side: the manager is a binary, but the plugins
are actually python modules. I had a quick look at
/usr/share/ceph/mgr/prometheus/module.py which seems to get the data
from a monitor - so I wonder if the problem lies more in the
architecture of ceph rather than the actual data export.

Cheers,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Manager carries wrong information until killing it

Reply via email to