[ceph-users] Re: How to remove failed OSD & reuse it?

Eugen Block Tue, 02 Sep 2025 01:38:06 -0700

Hi,

the docs [0] contain the OSD removal process:


ceph orch osd rm <osd_id(s)> [--replace] [--force] [--zap]

So in your case I'd just remove and zap the faulty OSDs:

ceph orch osd rm 0 3 --force --zap

If you have a managed OSD spec matching your setup, the orchestratorwill simply redeploy OSDs on the wiped disks. Of course, in aproduction environment you'd need to be sure if it's safe to wipe anOSD. So maybe try without --force first to see if it will result ininactive PGs. Now that those two OSDs are already dead, there's noreal danger here, but I just wanted to mention it.

You can also start with one OSD and see if the process works for you.

Regards,
Eugen

[0] https://docs.ceph.com/en/latest/cephadm/services/osd/#remove-an-osd

Zitat von lejeczek <pelj...@yahoo.co.uk>:

Hi guys.
I've browsing through the net in a search of a relatively clear"howto" but I failed to find one. It's rather many, sometimesdifferent notes/thoughts on how to deal with such/similar situation.Having a 3-node containerized cluster which lost osd - it crushed,there is nothing wrong with the node, nothing wrong with the disk,but never mind that.
Is there a howto which covers containerized environment?
One example I followed is:https://docs.redhat.com/en/documentation/red_hat_ceph_storage/1.2.3/html/red_hat_ceph_administration_guide/setting_unsetting_overrides
but it is - to me - clear, what to do with "broken" containers.
I'm got to:
-> $ ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1         0.68359  root default
-3               0      host podster1
-7         0.34180      host podster2
 2    hdd  0.04880          osd.2          up   1.00000  1.00000
 4    hdd  0.29300          osd.4          up   1.00000  1.00000
-5         0.34180      host podster3
 1    hdd  0.04880          osd.1          up   1.00000  1.00000
 5    hdd  0.29300          osd.5          up   1.00000  1.00000

yet:
-> $ ceph orch ps --daemon-type=osd
NAME HOST PORTS STATUS REFRESHED AGE MEMUSE MEM LIM VERSION IMAGE ID CONTAINER IDosd.0 podster1.mine.priv error 7m ago 3w - 4096M <unknown> <unknown> <unknown>osd.1 podster3.mine.priv running (25h) 7m ago 3w 942M 4096M 19.2.3 aade1b12b8e6 d71051ea79dcosd.2 podster2.mine.priv running (6d) 7m ago 3w1192M 4096M 19.2.3 aade1b12b8e6 e8d05142a73aosd.3 podster1.mine.priv error 7m ago 2w - 4096M <unknown> <unknown> <unknown>osd.4 podster2.mine.priv running (6d) 7m ago 2w3293M 4096M 19.2.3 aade1b12b8e6 6116277f69d1osd.5 podster3.mine.priv running (25h) 7m ago 2w2963M 4096M 19.2.3 aade1b12b8e6 d671bf73cc01
what would be next bits needed to complete suchremoval&reuse/re-create of osd(s)?p.s. This a 'lab' setup so I'm not worried, but it'd be great tocomplete this process in a healthy manner.
many thanks, L.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How to remove failed OSD & reuse it?

Reply via email to