Hi Danish, You seem to be hitting this bug [1] that was fixed in Quincy v17.2.8 by [2].
If you're using packages (.rpm,.deb), you could upgrade to v17.2.9 that has the fix. Do not use v17.2.8 due to a critical BlueStore bug. If you're using cephadm and containers, you'll have to upgrade to Reef as v17.2.9 had no container images. Best regards, Frédéric. [1] https://tracker.ceph.com/issues/55606 [2] https://github.com/ceph/ceph/pull/52461 -- Frédéric Nass Ceph Ambassador France | Senior Ceph Engineer @ CLYSO Squishing Squids - A Ceph Compression Guide <https://www.eventbrite.com/e/squishing-squids-a-ceph-compression-guide-tickets-1981347673227>, February 25th, 9am PST. https://clyso.com | [email protected] Le sam. 31 janv. 2026 à 09:09, Danish Khan via ceph-users < [email protected]> a écrit : > Dear Team, > > I am keep getting below error after every few days: > > Error : > Module 'devicehealth' has failed: unknown operation > > Actual error from MGR log: > > 2026-01-30T05:50:59.338+0530 7f314da26640 0 [devicehealth ERROR root] > Caught fatal database error: unknown operation > 2026-01-30T05:50:59.338+0530 7f314da26640 -1 log_channel(cluster) log [ERR] > : Unhandled exception from module 'devicehealth' while running on > mgr.controller3: unknown operation > 2026-01-30T05:50:59.338+0530 7f314da26640 -1 devicehealth.serve: > 2026-01-30T05:50:59.338+0530 7f314da26640 -1 Traceback (most recent call > last): > File "/usr/share/ceph/mgr/devicehealth/module.py", line 408, in serve > self._do_serve() > File "/usr/share/ceph/mgr/mgr_module.py", line 513, in check > return func(self, *args, **kwargs) > File "/usr/share/ceph/mgr/devicehealth/module.py", line 399, in _do_serve > self.scrape_all() > File "/usr/share/ceph/mgr/devicehealth/module.py", line 449, in > scrape_all > self.put_device_metrics(device, data) > File "/usr/share/ceph/mgr/devicehealth/module.py", line 525, in > put_device_metrics > self._create_device(devid) > File "/usr/share/ceph/mgr/devicehealth/module.py", line 511, in > _create_device > cursor = self.db.execute(SQL, (devid,)) > sqlite3.InternalError: unknown operation > > 2026-01-30T05:50:59.366+0530 7f3155335640 0 log_channel(cluster) log [DBG] > : pgmap v116150: 6977 pgs: 92 peering, 40 stale+active+clean, 191 > active+clean+laggy, 30 active+undersized, 1026 active+undersized+degraded, > 5598 active+clean; 48 TiB data, 153 TiB used, 1.2 PiB / 1.4 PiB avail; 44 > MiB/s wr, 1.00k op/s; 8043806/154077270 objects degraded (5.221%); 4.4 > GiB/s, 8 keys/s, 530 objects/s recovering > > And I get one warning for mgr daemon crash but all MGR Service > actually doesn't get failed or restarted automatically. > > I guess it is something related to sqlite3 which checks ceph > health internally. > > Although this is not causing any issue and error disappears after > restarting ceph-mgr service or doing a failover of MGR to another > controller. > > Does anyone know how we can resolve this issue permanently? > > Similar issue: > https://github.com/rook/rook/issues/12349 > > Regards, > Danish > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
