[ceph-users] Re: OSD corruption and down PGs

Eugen Block Tue, 12 May 2020 03:39:57 -0700

Can you share your osd tree and the current ceph status?



Zitat von Kári Bertilsson <[email protected]>:

Hello

I had an incidence where 3 OSD's crashed at once completely and won't power
up. And during recovery 3 OSD's in another host have somehow become
corrupted. I am running erasure coding with 8+2 setup using crush map which
takes 2 OSDs per host, and after losing the other 2 OSD i have few PG's
down. Unfortunately these PG's seem to overlap almost all data on the pool,
so i believe the entire pool is mostly lost after only these 2% of PG's
down.

I am running ceph 14.2.9.

OSD 92 log https://pastebin.com/5aq8SyCW
OSD 97 log https://pastebin.com/uJELZxwr

ceph-bluestore-tool repair without --deep showed "success" but OSD's still
fail with the log above.

Log from trying ceph-bluestore-tool repair --deep which is still running,
not sure if it will actually fix anything and log looks pretty bad.
https://pastebin.com/gkqTZpY3

Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97 --op
list" gave me input/output error. But everything in SMART looks OK, and i
see no indication of hardware read error in any logs. Same for both OSD.

The OSD's with corruption have absolutely no bad sectors and likely have
only a minor corruption but at important locations.

Any ideas on how to recover this kind of scenario ? Any tips would be
highly appreciated.

Best regards,
Kári Bertilsson
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: OSD corruption and down PGs

Reply via email to