Dear Team,
I am keep getting below error after every few days:
Error :
Module 'devicehealth' has failed: unknown operation
Actual error from MGR log:
2026-01-30T05:50:59.338+0530 7f314da26640 0 [devicehealth ERROR root]
Caught fatal database error: unknown operation
2026-01-30T05:50:59.338+0530 7f314da26640 -1 log_channel(cluster) log [ERR]
: Unhandled exception from module 'devicehealth' while running on
mgr.controller3: unknown operation
2026-01-30T05:50:59.338+0530 7f314da26640 -1 devicehealth.serve:
2026-01-30T05:50:59.338+0530 7f314da26640 -1 Traceback (most recent call
last):
File "/usr/share/ceph/mgr/devicehealth/module.py", line 408, in serve
self._do_serve()
File "/usr/share/ceph/mgr/mgr_module.py", line 513, in check
return func(self, *args, **kwargs)
File "/usr/share/ceph/mgr/devicehealth/module.py", line 399, in _do_serve
self.scrape_all()
File "/usr/share/ceph/mgr/devicehealth/module.py", line 449, in scrape_all
self.put_device_metrics(device, data)
File "/usr/share/ceph/mgr/devicehealth/module.py", line 525, in
put_device_metrics
self._create_device(devid)
File "/usr/share/ceph/mgr/devicehealth/module.py", line 511, in
_create_device
cursor = self.db.execute(SQL, (devid,))
sqlite3.InternalError: unknown operation
2026-01-30T05:50:59.366+0530 7f3155335640 0 log_channel(cluster) log [DBG]
: pgmap v116150: 6977 pgs: 92 peering, 40 stale+active+clean, 191
active+clean+laggy, 30 active+undersized, 1026 active+undersized+degraded,
5598 active+clean; 48 TiB data, 153 TiB used, 1.2 PiB / 1.4 PiB avail; 44
MiB/s wr, 1.00k op/s; 8043806/154077270 objects degraded (5.221%); 4.4
GiB/s, 8 keys/s, 530 objects/s recovering
And I get one warning for mgr daemon crash but all MGR Service
actually doesn't get failed or restarted automatically.
I guess it is something related to sqlite3 which checks ceph
health internally.
Although this is not causing any issue and error disappears after
restarting ceph-mgr service or doing a failover of MGR to another
controller.
Does anyone know how we can resolve this issue permanently?
Similar issue:
https://github.com/rook/rook/issues/12349
Regards,
Danish
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]