[ceph-users] [ceph-user][ceph-ansible] Quincy: Module 'devicehealth' has failed: unknown operation

Danish Khan via ceph-users Sat, 31 Jan 2026 00:17:57 -0800

Dear Team,

I am keep getting below error after every few days:


Error :
Module 'devicehealth' has failed: unknown operation

Actual error from MGR log:

2026-01-30T05:50:59.338+0530 7f314da26640  0 [devicehealth ERROR root]
Caught fatal database error: unknown operation
2026-01-30T05:50:59.338+0530 7f314da26640 -1 log_channel(cluster) log [ERR]
: Unhandled exception from module 'devicehealth' while running on
mgr.controller3: unknown operation
2026-01-30T05:50:59.338+0530 7f314da26640 -1 devicehealth.serve:
2026-01-30T05:50:59.338+0530 7f314da26640 -1 Traceback (most recent call
last):
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 408, in serve
    self._do_serve()
  File "/usr/share/ceph/mgr/mgr_module.py", line 513, in check
    return func(self, *args, **kwargs)
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 399, in _do_serve
    self.scrape_all()
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 449, in scrape_all
    self.put_device_metrics(device, data)
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 525, in
put_device_metrics
    self._create_device(devid)
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 511, in
_create_device
    cursor = self.db.execute(SQL, (devid,))
sqlite3.InternalError: unknown operation

2026-01-30T05:50:59.366+0530 7f3155335640  0 log_channel(cluster) log [DBG]
: pgmap v116150: 6977 pgs: 92 peering, 40 stale+active+clean, 191
active+clean+laggy, 30 active+undersized, 1026 active+undersized+degraded,
5598 active+clean; 48 TiB data, 153 TiB used, 1.2 PiB / 1.4 PiB avail; 44
MiB/s wr, 1.00k op/s; 8043806/154077270 objects degraded (5.221%); 4.4
GiB/s, 8 keys/s, 530 objects/s recovering

And I get one warning for mgr daemon crash but all MGR Service
actually doesn't get failed or restarted automatically.

I guess it is something related to sqlite3 which checks ceph
health internally.

Although this is not causing any issue and error disappears after
restarting ceph-mgr service or doing a failover of MGR to another
controller.

Does anyone know how we can resolve this issue permanently?

Similar issue:
https://github.com/rook/rook/issues/12349

Regards,
Danish
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] [ceph-user][ceph-ansible] Quincy: Module 'devicehealth' has failed: unknown operation

Reply via email to