Hello,
On Wed, 1 Nov 2017 09:30:06 +0100 Michael wrote:
> Hello everyone,
>
> I've conducted some crash tests (unplugging drives, the machine,
Your exact system configuration (HW, drives, controller, settings, etc)
would be interesting as I can think of plenty scenarios on how to corrupt
things that normally shouldn't be affected by such actions.
> terminating and restarting ceph systemd services) with Ceph 12.2.0 on
Now that bit is quite disconcerting, though you're one release behind the
curve and from what I read .2 has plenty more bug fixes coming.
Christian
> Ubuntu and quite easily managed to corrupt what appears to be rocksdb's
> log replay on a bluestore OSD:
>
> # ceph-bluestore-tool fsckĀ --path /var/lib/ceph/osd/ceph-2/
> [...]
> 4 rocksdb:
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/version_set.cc:2859]
> Recovered from manifest file:db/MANIFEST-000975
> succeeded,manifest_file_number is 975, next_file_number is 1008,
> last_sequence is 51965907, log_number is 0,prev_log_number is
> 0,max_column_family is 0
> 4 rocksdb:
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/version_set.cc:2867]
> Column family [default] (ID 0), log number is 1005
> 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1509298585082794, "job": 1,
> "event": "recovery_started", "log_files": [1003, 1005]}
> 4 rocksdb:
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:482]
> Recovering log #1003 mode 0
> 4 rocksdb:
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:482]
> Recovering log #1005 mode 0
> 3 rocksdb:
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:424]
> db/001005.log: dropping 3225 bytes; Corruption: missing start of
> fragmented record(2)
> 4 rocksdb:
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl.cc:217] Shutdown:
> canceling all background work
> 4 rocksdb:
> [/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl.cc:343] Shutdown
> complete
> -1 rocksdb: Corruption: missing start of fragmented record(2)
> -1 bluestore(/var/lib/ceph/osd/ceph-2/) _open_db erroring opening db:
> 1 bluefs umount
> 1 bdev(0x557f5b6a4240 /var/lib/ceph/osd/ceph-2//block) close
>
> If I understand this right, rocksdb isĀ just trying to replay WAL type
> logs, of which presumably "001005.log" is corrupted. It then throws an
> error that stops everything.
>
> I did try to mount the bluestore, as I was assuming that would probably
> where I'd find the rocksdb's files somewhere, but that also doesn't seem
> possible:
>
> #ceph-objectstore-tool --op fsck --data-path /var/lib/ceph/osd/ceph-2/
> --mountpoint /mnt/bluestore-repair/
> fsck failed: (5) Input/output error
> # ceph-objectstore-tool --op fuse --data-path /var/lib/ceph/osd/ceph-2
> --mountpoint /mnt/bluestore-repair/
> Mount failed with '(5) Input/output error'
> # ceph-objectstore-tool --op fuse --force --skip-journal-replay
> --data-path /var/lib/ceph/osd/ceph-2 --mountpoint /mnt/bluestore-repair/
> Mount failed with '(5) Input/output error'
>
> Adding --debug shows the ultimate culprit is just the above rocksdb
> error again.
>
> Q: Is there some way in which I can tell rockdb to truncate or delete /
> skip the respective log entries? Or can I get access to rocksdb('s
> files) in some other way to just manipulate it or delete corrupted WAL
> files manually?
>
> -Michael
>
--
Christian Balzer Network/Systems Engineer
[email protected] Rakuten Communications
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com