Greetings —
Have been seeing repeated crashes on my mgr module. Seems to run for about 45
to 50 seconds and then boom. Cephadm setup here. Did (try to) enable a couple
modules lately. iostat, stats diskprediction_local, but have toggled them back
off - unfortunately it hasn’t fixed the issue.
I was seeing “no mgr” with ceph -s — to get around that I’ve tweaked the
“StartLimitInterval” setting in the ceph-[CLUSTER-UID]@.service file down to
1m, and have 4x mgr’s setup so that I can get a couple commands run before mgr
crashes and another starts. The 30m default there seems… high, imo - I was
having intervals with no mgr which makes it tough to do much with the cluster -
even “ceph -s” was hanging. Everything else in the cluster seems “normal” -
still serving data.
One other note — which I think is generally unrelated — I did upgrade one of my
cluster nodes from “Plucky Puffin" (25.04) ubuntu, to “Questing Quokka” (25.10)
ubuntu. After the upgrade, cephadm managed containers didn’t want to start. I
tracked that down to having the ceph user userid in /etc/passwd set at 64045,
but the container seeming to want userid 167. Most things under
/var/lib/ceph/[CLUSTER UID]/ … appear to be owned by user/group 167:167 — I
assume this is a default inside the container. Workaround here was to manually
change the UID/GID for ceph in /etc/passwd and /etc/group. I’m going to imagine
this is some collision between cephadm managed deployments, and how Ubuntu /
apt installs cephadm.
The aforementioned mgr Crashes look like:
{
"assert_condition": "cursor != root",
"assert_file": "/ceph/rpmbuild/BUILD/ceph-20.2.0/src/mgr/PyFormatter.h",
"assert_func": "virtual void PyFormatter::close_section()",
"assert_line": 84,
"assert_msg": "/ceph/rpmbuild/BUILD/ceph-20.2.0/src/mgr/PyFormatter.h: In
function 'virtual void PyFormatter::close_section()' thread ffff34e55700 time
2026-02-08T13:10:07.526894+0000\n/ceph/rpmbuild/BUILD/ceph-20.2.0/src/mgr/PyFormatter.h:
84: FAILED ceph_assert(cursor != root)\n",
"assert_thread_name": "telemetry",
"backtrace": [
"__kernel_rt_sigreturn()",
"/lib64/libc.so.6(+0x82a78) [0xffff83603a78]",
"raise()",
"abort()",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x190) [0xffff84039874]",
"/usr/bin/ceph-mgr(+0xcf540) [0xaaaacdf2f540]",
"(ActivePyModules::get_perf_schema_python(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
const&)+0xf6c) [0xaaaacdf46ec0]",
"/usr/bin/ceph-mgr(+0x105528) [0xaaaacdf65528]",
"/lib64/libpython3.9.so.1.0(+0xf7bfc) [0xffff84ae9bfc]",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.9.so.1.0(+0xda3f0) [0xffff84acc3f0]",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.9.so.1.0(+0xc47e8) [0xffff84ab67e8]",
"_PyFunction_Vectorcall()",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.9.so.1.0(+0xc47e8) [0xffff84ab67e8]",
"_PyFunction_Vectorcall()",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.9.so.1.0(+0xc47e8) [0xffff84ab67e8]",
"_PyFunction_Vectorcall()",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.9.so.1.0(+0xc47e8) [0xffff84ab67e8]",
"_PyFunction_Vectorcall()",
"_PyEval_EvalFrameDefault()",
"/lib64/libpython3.9.so.1.0(+0xda3f0) [0xffff84acc3f0]",
"/lib64/libpython3.9.so.1.0(+0xea93c) [0xffff84adc93c]",
"/lib64/libpython3.9.so.1.0(+0xcf304) [0xffff84ac1304]",
"/lib64/libpython3.9.so.1.0(+0x197d78) [0xffff84b89d78]",
"_PyObject_CallMethod_SizeT()",
"(PyModuleRunner::serve()+0x6c) [0xaaaacdfdf6cc]",
"(PyModuleRunner::PyModuleRunnerThread::entry()+0x148) [0xaaaacdfdff08]"
],
"ceph_version": "20.2.0",
"crash_id":
"2026-02-08T13:10:07.528739Z_f18e6b74-438b-47db-9438-5a3861fdef2d",
"entity_name": "mgr.hc-945901a5cad1b6e3.mtijbv",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "9",
"os_version_id": "9",
"process_name": "ceph-mgr",
"stack_sig":
"319d76a0d71a4644f9d65f592f6e621cca918d9205fc759ab7acf6944bc77fdd",
"timestamp": "2026-02-08T13:10:07.528739Z",
"utsname_hostname": "hc-945901a5cad1b6e3",
"utsname_machine": "aarch64",
"utsname_release": "6.14.0-1010-raspi",
"utsname_sysname": "Linux",
"utsname_version": "#10-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 15 19:09:05 UTC
2025"
}
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]