Hello, we are running a ceph cluster at version:ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)
and since a few weeks the orchestrator started to misbehave - up to now we could not identify any root cause, so I am fishing in the community to see if there are any hints.
Problems: An OSD removal (for disk replacement) gets stuck in the 'purge' step: ceph orch osd rm 406 --replace root@aadm01:~# ceph orch osd rm statusOSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT 406 acn07 done, waiting for purge 0 True False True 2025-06-25 09:18:07.650734+00:00
(now for more than 24h in this state)At the same time the orchestrator is not restarting OSD daemons - i.e. an 'ceph orch daemon restart osd.xxx' claims its queuing uo the restart, but it never happens. Other services continue to be controlled correctly via 'ceph orch ...'
If anyone has an idea where to poke around or can match this to some known problem - I would appreciate any pointers.
Regards, Holger -- Dr. Holger Naundorf Christian-Albrechts-Universität zu Kiel Rechenzentrum / HPC / Server und Storage Tel: +49 431 880-1990 Fax: +49 431 880-1523 naund...@rz.uni-kiel.de
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io