[ceph-users] Re: Repeated crashes of mgr module in Tentacle

Eugen Block via ceph-users Tue, 10 Feb 2026 05:34:57 -0800

Hi,

> even “ceph -s” was hanging


that command is contacting the MONs, are you sure the cluster is healthy?
If ' ceph -s' hangs, I suspect either a network or a MON (quorum) issue.

The default ceph uid/gid has always been 167 since I've been working with
Ceph, so more than 10 years now, and it's the same within containers. And
yes, there is a difference between cephadm setup and package based
installs. Did you ensure that you don't have any other ceph packages
installed except cephadm and maybe ceph-common? There can be issues when
both deb packages and containers are running for the same daemon. So if
only the MGR is affected, I recommend to ensure that no ceph-mgr package is
installed.

It looks like the telemetry module is responsible for the crash, can you
turn that off? In latest releases you can force to turn off a module:

ceph mgr module force disable telemetry

Can you check if that doesn't crash the MGR? Also some debug (debug_mgr)
logs might be helpful for the devs. I don't find an existing tracker issue
for this, I'd suggest to create a new one.

Regards,
Eugen

Am So., 8. Feb. 2026 um 14:46 Uhr schrieb Daniel Brown via ceph-users <
[email protected]>:

>
>
> Greetings —
>
> Have been seeing repeated crashes on my mgr module. Seems to run for about
> 45 to 50 seconds and then boom. Cephadm setup here. Did (try to) enable a
> couple modules lately. iostat, stats diskprediction_local, but have toggled
> them back off - unfortunately it hasn’t fixed the issue.
>
> I was seeing “no mgr” with ceph -s — to get around that I’ve tweaked the
> “StartLimitInterval” setting in the ceph-[CLUSTER-UID]@.service file down
> to 1m, and have 4x mgr’s setup so that I can get a couple commands run
> before mgr crashes and another starts. The 30m default there seems… high,
> imo - I was having intervals with no mgr which makes it tough to do much
> with the cluster - even “ceph -s” was hanging. Everything else in the
> cluster seems “normal” - still serving data.
>
>
> One other note — which I think is generally unrelated — I did upgrade one
> of my cluster nodes from “Plucky Puffin" (25.04) ubuntu, to “Questing
> Quokka” (25.10) ubuntu. After the upgrade, cephadm managed containers
> didn’t want to start. I tracked that down to having the ceph user userid in
> /etc/passwd set at 64045, but the container seeming to want userid 167.
> Most things under /var/lib/ceph/[CLUSTER UID]/ … appear to be owned by
> user/group 167:167 — I assume this is a default inside the container.
> Workaround here was to manually change the UID/GID for ceph in /etc/passwd
> and /etc/group. I’m going to imagine this is some collision between cephadm
> managed deployments, and how Ubuntu / apt installs cephadm.
>
>
>
> The aforementioned mgr Crashes look like:
>
>
> {
>     "assert_condition": "cursor != root",
>     "assert_file":
> "/ceph/rpmbuild/BUILD/ceph-20.2.0/src/mgr/PyFormatter.h",
>     "assert_func": "virtual void PyFormatter::close_section()",
>     "assert_line": 84,
>     "assert_msg": "/ceph/rpmbuild/BUILD/ceph-20.2.0/src/mgr/PyFormatter.h:
> In function 'virtual void PyFormatter::close_section()' thread ffff34e55700
> time
> 2026-02-08T13:10:07.526894+0000\n/ceph/rpmbuild/BUILD/ceph-20.2.0/src/mgr/PyFormatter.h:
> 84: FAILED ceph_assert(cursor != root)\n",
>     "assert_thread_name": "telemetry",
>     "backtrace": [
>         "__kernel_rt_sigreturn()",
>         "/lib64/libc.so.6(+0x82a78) [0xffff83603a78]",
>         "raise()",
>         "abort()",
>         "(ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x190) [0xffff84039874]",
>         "/usr/bin/ceph-mgr(+0xcf540) [0xaaaacdf2f540]",
>
> "(ActivePyModules::get_perf_schema_python(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&)+0xf6c) [0xaaaacdf46ec0]",
>         "/usr/bin/ceph-mgr(+0x105528) [0xaaaacdf65528]",
>         "/lib64/libpython3.9.so.1.0(+0xf7bfc) [0xffff84ae9bfc]",
>         "_PyEval_EvalFrameDefault()",
>         "/lib64/libpython3.9.so.1.0(+0xda3f0) [0xffff84acc3f0]",
>         "_PyEval_EvalFrameDefault()",
>         "/lib64/libpython3.9.so.1.0(+0xc47e8) [0xffff84ab67e8]",
>         "_PyFunction_Vectorcall()",
>         "_PyEval_EvalFrameDefault()",
>         "/lib64/libpython3.9.so.1.0(+0xc47e8) [0xffff84ab67e8]",
>         "_PyFunction_Vectorcall()",
>         "_PyEval_EvalFrameDefault()",
>         "/lib64/libpython3.9.so.1.0(+0xc47e8) [0xffff84ab67e8]",
>         "_PyFunction_Vectorcall()",
>         "_PyEval_EvalFrameDefault()",
>         "/lib64/libpython3.9.so.1.0(+0xc47e8) [0xffff84ab67e8]",
>         "_PyFunction_Vectorcall()",
>         "_PyEval_EvalFrameDefault()",
>         "/lib64/libpython3.9.so.1.0(+0xda3f0) [0xffff84acc3f0]",
>         "/lib64/libpython3.9.so.1.0(+0xea93c) [0xffff84adc93c]",
>         "/lib64/libpython3.9.so.1.0(+0xcf304) [0xffff84ac1304]",
>         "/lib64/libpython3.9.so.1.0(+0x197d78) [0xffff84b89d78]",
>         "_PyObject_CallMethod_SizeT()",
>         "(PyModuleRunner::serve()+0x6c) [0xaaaacdfdf6cc]",
>         "(PyModuleRunner::PyModuleRunnerThread::entry()+0x148)
> [0xaaaacdfdff08]"
>     ],
>     "ceph_version": "20.2.0",
>     "crash_id":
> "2026-02-08T13:10:07.528739Z_f18e6b74-438b-47db-9438-5a3861fdef2d",
>     "entity_name": "mgr.hc-945901a5cad1b6e3.mtijbv",
>     "os_id": "centos",
>     "os_name": "CentOS Stream",
>     "os_version": "9",
>     "os_version_id": "9",
>     "process_name": "ceph-mgr",
>     "stack_sig":
> "319d76a0d71a4644f9d65f592f6e621cca918d9205fc759ab7acf6944bc77fdd",
>     "timestamp": "2026-02-08T13:10:07.528739Z",
>     "utsname_hostname": "hc-945901a5cad1b6e3",
>     "utsname_machine": "aarch64",
>     "utsname_release": "6.14.0-1010-raspi",
>     "utsname_sysname": "Linux",
>     "utsname_version": "#10-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 15 19:09:05
> UTC 2025"
> }
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Repeated crashes of mgr module in Tentacle

Reply via email to