Hi Danish,

You seem to be hitting this bug [1] that was fixed in Quincy v17.2.8 by [2].

If you're using packages (.rpm,.deb), you could upgrade to v17.2.9 that has
the fix. Do not use v17.2.8 due to a critical BlueStore bug.
If you're using cephadm and containers, you'll have to upgrade to Reef as
v17.2.9 had no container images.

Best regards,
Frédéric.

[1] https://tracker.ceph.com/issues/55606
[2] https://github.com/ceph/ceph/pull/52461

--
Frédéric Nass
Ceph Ambassador France | Senior Ceph Engineer @ CLYSO

Squishing Squids - A Ceph Compression Guide
<https://www.eventbrite.com/e/squishing-squids-a-ceph-compression-guide-tickets-1981347673227>,
February 25th, 9am PST.
https://clyso.com | [email protected]


Le sam. 31 janv. 2026 à 09:09, Danish Khan via ceph-users <
[email protected]> a écrit :

> Dear Team,
>
> I am keep getting below error after every few days:
>
> Error :
> Module 'devicehealth' has failed: unknown operation
>
> Actual error from MGR log:
>
> 2026-01-30T05:50:59.338+0530 7f314da26640  0 [devicehealth ERROR root]
> Caught fatal database error: unknown operation
> 2026-01-30T05:50:59.338+0530 7f314da26640 -1 log_channel(cluster) log [ERR]
> : Unhandled exception from module 'devicehealth' while running on
> mgr.controller3: unknown operation
> 2026-01-30T05:50:59.338+0530 7f314da26640 -1 devicehealth.serve:
> 2026-01-30T05:50:59.338+0530 7f314da26640 -1 Traceback (most recent call
> last):
>   File "/usr/share/ceph/mgr/devicehealth/module.py", line 408, in serve
>     self._do_serve()
>   File "/usr/share/ceph/mgr/mgr_module.py", line 513, in check
>     return func(self, *args, **kwargs)
>   File "/usr/share/ceph/mgr/devicehealth/module.py", line 399, in _do_serve
>     self.scrape_all()
>   File "/usr/share/ceph/mgr/devicehealth/module.py", line 449, in
> scrape_all
>     self.put_device_metrics(device, data)
>   File "/usr/share/ceph/mgr/devicehealth/module.py", line 525, in
> put_device_metrics
>     self._create_device(devid)
>   File "/usr/share/ceph/mgr/devicehealth/module.py", line 511, in
> _create_device
>     cursor = self.db.execute(SQL, (devid,))
> sqlite3.InternalError: unknown operation
>
> 2026-01-30T05:50:59.366+0530 7f3155335640  0 log_channel(cluster) log [DBG]
> : pgmap v116150: 6977 pgs: 92 peering, 40 stale+active+clean, 191
> active+clean+laggy, 30 active+undersized, 1026 active+undersized+degraded,
> 5598 active+clean; 48 TiB data, 153 TiB used, 1.2 PiB / 1.4 PiB avail; 44
> MiB/s wr, 1.00k op/s; 8043806/154077270 objects degraded (5.221%); 4.4
> GiB/s, 8 keys/s, 530 objects/s recovering
>
> And I get one warning for mgr daemon crash but all MGR Service
> actually doesn't get failed or restarted automatically.
>
> I guess it is something related to sqlite3 which checks ceph
> health internally.
>
> Although this is not causing any issue and error disappears after
> restarting ceph-mgr service or doing a failover of MGR to another
> controller.
>
> Does anyone know how we can resolve this issue permanently?
>
> Similar issue:
> https://github.com/rook/rook/issues/12349
>
> Regards,
> Danish
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to