I have a hammer cluster that died a bit ago (hammer 94.9) consisting of 3
monitors and 630 osds spread across 21 storage hosts. The clusters monitors
all died due to leveldb corruption and the cluster was shut down. I was
finally given word that I could try to revive the cluster this week!

https://github.com/ceph/ceph/blob/hammer/doc/rados/troubleshooting/troubleshooting-mon.rst#recovery-using-osds

I see that the latest hammer code in github has the ceph-monstore-tool
rebuild backport and that is what I am running on the cluster now (ceph
version 0.94.9-4530-g83af8cd (83af8cdaaa6d94404e6146b68e532a784e3cc99c). I
was able to scrape all 630 of the osds and am left with a 1.1G store.db
directory. Using python I was successfully able to list all of the keys and
values which was very promising. That said I can not run the final command
in the recovery-using-osds article (ceph-monstore-tool rebuild)
successfully.

Whenever I run the tool (with the newly created admin keyring or with my
existing one) it errors with the following:


   1.      0> 2017-02-17 15:00:47.516901 7f8b4d7408c0 -1
./mon/MonitorDBStore.h:
   In function 'KeyValueDB::Iterator MonitorDBStore::get_iterator(const
   string&)' thread 7f8b4d7408c0 time 2017-02-07 15:00:47.516319
   2.


The complete trace is here
http://pastebin.com/NQE8uYiG

Can anyone lend a hand and tell me what may be wrong? I am able to iterate
over the leveldb database in python so the structure should be somewhat
okay? Am I SOL at this point? The cluster isn't production any longer and
while I don't have months of time I would really like to recover this
cluster just to see if it is at all possible.
-- 
- Sean:  I wrote this. -
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to