[ceph-users] Multiple corrupt bluestore osds, Host Machine attacks VM OSDs

Daniel Williams Thu, 25 Jul 2019 20:32:40 -0700

Hey,

I have a machine with 5 drives in a VM and 5 drives that were on the same
host machine. I've made this mistake once before ceph-volume activate -all
the host machines drives and it takes over the 5 drives in the VM as well
and corrupts them.


I've actually lost data this time. Erasure encoded 6.3 but losing 5 drives
I lost a small number of PGs (6). Repair gives this message

$ ceph-bluestore-tool repair --deep true --path /var/lib/ceph/osd/ceph-0

/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_replay(bool, bool)' thread 7f21c3c6d980 time 2019-07-25
23:19:44.820537
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: 848: FAILED assert(r !=
q->second->file_map.end())
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14e) [0x7f21c56c4b5e]
 2: (()+0x2c4cb7) [0x7f21c56c4cb7]
 3: (BlueFS::_replay(bool, bool)+0x4082) [0x56432ef954a2]
 4: (BlueFS::mount()+0xff) [0x56432ef958ef]
 5: (BlueStore::_open_db(bool, bool)+0x81c) [0x56432eff1a1c]
 6: (BlueStore::_fsck(bool, bool)+0x337) [0x56432f00e0a7]
 7: (main()+0xf0a) [0x56432eea7dca]
 8: (__libc_start_main()+0xeb) [0x7f21c4b7109b]
 9: (_start()+0x2a) [0x56432ef700fa]
*** Caught signal (Aborted) **
 in thread 7f21c3c6d980 thread_name:ceph-bluestore-
2019-07-25 23:19:44.817 7f21c3c6d980 -1
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_replay(bool, bool)' thread 7f21c3c6d980 time 2019-07-25
23:19:44.820537
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: 848: FAILED assert(r !=
q->second->file_map.end())

I have two osds that don't start but at least make it further into the
repair
ceph-bluestore-tool repair --deep true --path /var/lib/ceph/osd/ceph-8
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661,
expected 0x9344f85e, device location [0x10000~1000], logical extent
0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661,
expected 0x9344f85e, device location [0x10000~1000], logical extent
0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661,
expected 0x9344f85e, device location [0x10000~1000], logical extent
0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661,
expected 0x9344f85e, device location [0x10000~1000], logical extent
0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
fsck error: #-1:7b3f43c4:::osd_superblock:0# error during read:  0~21a (5)
Input/output error
... still running ....

I've read through the archives and unlike others who have come across this
I'm not able to recover the content without the lost OSDs.

These PGs are backing a cephfs instance, so ideally
1. I'd be able to recover the 6 missing PGs for 3 OSDs of the 5 in broken
state...
or less desirable
2. Figure out how to map PGs to cephfs files that I lost on the cephfs so
that I can figure out whats lost and what remains.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Multiple corrupt bluestore osds, Host Machine attacks VM OSDs

Reply via email to