We had a fauly disk which was causing many errors, and replacement took a
while so we had to try to stop ceph from using the OSD in during this time.
However I think we must have done that wrong and after the disk replacement
our ceph orch seems to have picked up /dev/sdp and added the a new osd and
automatically (588), without a separate DB device (since that was still
taken by the old OSD 31 maybe? I'm not sure how to ).
This led to issues where osd31 of course wouldn't start, and some actions
were attempted to clear this out, which might have just caused more harm.
Long story short, we are currently in a odd position where we have still
have ceph-volume lvm list osd.31 with only a [db] section:
====== osd.31 ======
[db]
/dev/ceph-1b309b1e-a4a6-4861-b16c-7c06ecde1a3d/osd-db-fb09a714-f955-4418-99f2-6bccd8c6220e
block device
/dev/ceph-48f7dbd8-4a7c-4f7e-8962-104e756ae864/osd-block-33538b36-52b3-421d-bf66-6c729a057707
block uuid bykFYi-z8T6-OWXp-i1OB-H7CE-uLDm-Td6QTI
cephx lockbox secret
cluster fsid 5406fed0-d52b-11ec-beff-7ed30a54847b
cluster name ceph
crush device class None
db device
/dev/ceph-1b309b1e-a4a6-4861-b16c-7c06ecde1a3d/osd-db-fb09a714-f955-4418-99f2-6bccd8c6220e
db uuid Vy3aOA-qseQ-RIDT-741e-z7o0-y376-kKTXRE
encrypted 0
osd fsid 33538b36-52b3-421d-bf66-6c729a057707
osd id 31
osdspec affinity osd_spec
type db
vdo 0
devices /dev/nvme0n1
and a seperate extra osd.588 (which is running) which has taken only the
[block] device
===== osd.588 ======
[block]
/dev/ceph-f63ef837-3b18-47a4-be55-d5c2c0db8927/osd-block-58b33b8f-9623-46b3-a86a-3061602a76b5
block device
/dev/ceph-f63ef837-3b18-47a4-be55-d5c2c0db8927/osd-block-58b33b8f-9623-46b3-a86a-3061602a76b5
block uuid KYHzBq-zgJJ-Nw93-j7Jx-Oz5i-BMuU-ndtTCH
cephx lockbox secret
cluster fsid 5406fed0-d52b-11ec-beff-7ed30a54847b
cluster name ceph
crush device class
encrypted 0
osd fsid 58b33b8f-9623-46b3-a86a-3061602a76b5
osd id 588
osdspec affinity all-available-devices
type block
vdo 0
devices /dev/sdp
I figured the best action was to clear out both of these faulty OSDs via
orch "ceph orch osd rm XX" but osd 31 isn't recognized
[ceph: root@mimer-osd01 /]# ceph orch osd rm 31
Unable to find OSDs: ['31']
Deleting 588 is recognized. Should I attempt to clear out the osd.31 from
ceph-volume manually?
I'd really like to get back to a situation where I have osd.31 with the osd
fsid that matches the device names, with /dev/sdp and /dev/nmve0n1 but I'm
really afraid of just breaking things even more.
>From what i can see from files laying around, the OSD spec we have is
simply:
placement:
host_pattern: "mimer-osd01"
service_id: osd_spec
service_type: osd
spec:
data_devices:
rotational: 1
db_devices:
rotational: 0
in case this matters. I appreciate any help or guidance.
Best regards, Mikael
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]