A coworker patched leveldb and we were able to export quite a bit of data from kh08's leveldb database. At this point I think I need to re-construct a new leveldb with whatever values I can. Is it the same leveldb database across all 3 montiors? IE will keys exported from one work in the other? All should have the same keys/values although constructed differently right? I can't blindly copy /var/lib/ceph/mon/ceph-$(hostname)/store.db/ from one host to another right? But can I copy the keys/values from one to another?
On Fri, Aug 12, 2016 at 12:45 PM, Sean Sullivan <[email protected]> wrote: > ceph-monstore-tool? Is that the same as monmaptool? oops! NM found it in > ceph-test package:: > > I can't seem to get it working :-( dump monmap or any of the commands. > They all bomb out with the same message: > > root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool > /var/lib/ceph/mon/ceph-kh10-8 dump-trace -- /tmp/test.trace > Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/ > store.db/10882319.ldb > root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool > /var/lib/ceph/mon/ceph-kh10-8 dump-keys > Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/ > store.db/10882319.ldb > > > I need to clarify as I originally had 2 clusters with this issue and now I > have 1 with all 3 monitors dead and 1 that I was successfully able to > repair. I am about to recap everything I know about the issue and the issue > at hand. Should I start a new email thread about this instead? > > The cluster that is currently having issues is on hammer (94.7), and the > monitor stats are the same:: > root@kh08-8:~# cat /proc/cpuinfo | grep -iE "model name" | uniq -c > 24 model name : Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz > ext4 volume comprised of 4x300GB 10k drives in raid 10. > ubuntu 14.04 > > root@kh08-8:~# uname -a > Linux kh08-8 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > root@kh08-8:~# ceph --version > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) > > > From here: Here are the errors I am getting when starting each of the > monitors:: > > > --------------- > root@kh08-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh08-8 -d > 2016-08-11 22:15:23.731550 7fe5ad3e98c0 0 ceph version 0.94.7 > (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 317309 > Corruption: error in middle of record > 2016-08-11 22:15:28.274340 7fe5ad3e98c0 -1 error opening mon data > directory at '/var/lib/ceph/mon/ceph-kh08-8': (22) Invalid argument > -- > root@kh09-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh09-8 -d > 2016-08-11 22:14:28.252370 7f7eaab908c0 0 ceph version 0.94.7 > (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 308888 > Corruption: 14 missing files; e.g.: /var/lib/ceph/mon/ceph-kh09-8/ > store.db/10845998.ldb > 2016-08-11 22:14:35.094237 7f7eaab908c0 -1 error opening mon data > directory at '/var/lib/ceph/mon/ceph-kh09-8': (22) Invalid argument > -- > root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# /usr/bin/ceph-mon > --cluster=ceph -i kh10-8 -d > 2016-08-11 22:17:54.632762 7f80bf34d8c0 0 ceph version 0.94.7 > (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 292620 > Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/ > store.db/10882319.ldb > 2016-08-11 22:18:01.207749 7f80bf34d8c0 -1 error opening mon data > directory at '/var/lib/ceph/mon/ceph-kh10-8': (22) Invalid argument > --------------- > > > for kh08, a coworker patched leveldb to print and skip on the first error > and that one is also missing a bunch of files. As such I think kh10-8 is my > most likely candidate to recover but either way recovery is probably not an > option. I see leveldb has a repair.cc (https://github.com/google/lev > eldb/blob/master/db/repair.cc)) but I do not see repair mentioned in > monitor in respect to the dbstore. I tried using the leveldb python module > (plyvel) to attempt a repair but my repl just ends up dying. > > I understand two things:: 1.) Without rebuilding the monitor backend > leveldb (the cluster map as I understand it) store all of the data in the > cluster is essentialy lost (right?) > 2.) it is possible to rebuild > this database via some form of magic or (source)ry as all of this data is > essential held throughout the cluster as well. > > We only use radosgw / S3 for this cluster. If there is a way to recover my > data that is easier//more likely than rebuilding the leveldb of a monitor > and starting a single monitor cluster up I would like to switch gears and > focus on that. > > Looking at the dev docs: > http://docs.ceph.com/docs/hammer/architecture/#cluster-map > it has 5 main parts:: > > ``` > The Monitor Map: Contains the cluster fsid, the position, name address and > port of each monitor. It also indicates the current epoch, when the map was > created, and the last time it changed. To view a monitor map, execute ceph > mon dump. > The OSD Map: Contains the cluster fsid, when the map was created and last > modified, a list of pools, replica sizes, PG numbers, a list of OSDs and > their status (e.g., up, in). To view an OSD map, execute ceph osd dump. > The PG Map: Contains the PG version, its time stamp, the last OSD map > epoch, the full ratios, and details on each placement group such as the PG > ID, the Up Set, the Acting Set, the state of the PG (e.g., active + clean), > and data usage statistics for each pool. > The CRUSH Map: Contains a list of storage devices, the failure domain > hierarchy (e.g., device, host, rack, row, room, etc.), and rules for > traversing the hierarchy when storing data. To view a CRUSH map, execute > ceph osd getcrushmap -o {filename}; then, decompile it by executing > crushtool -d {comp-crushmap-filename} -o {decomp-crushmap-filename}. You > can view the decompiled map in a text editor or with cat. > The MDS Map: Contains the current MDS map epoch, when the map was created, > and the last time it changed. It also contains the pool for storing > metadata, a list of metadata servers, and which metadata servers are up and > in. To view an MDS map, execute ceph mds dump. > ``` > > As we don't use cephfs mds can essentially be blank(right) so I am left > with 4 valid maps needed to get a working cluster again. I don't see auth > mentioned in there but that too. Then I just need to rebuild the leveldb > database somehow with the right information and I should be good. So long > long long journey ahead. > > I don't think that the data is stored in strings or json, right? Am I > going down the wrong path here? Is there a shorter/simpler path to retrieve > the data from a cluster that lost all 3 monitors in power falure? If I am > going down the right path is there any advice on how I can assemble/repair > the database? > > I see that there is a rbd recovery from a dead cluster tool. Is it > possible to do the same with s3 objects? > > On Thu, Aug 11, 2016 at 11:15 AM, Wido den Hollander <[email protected]> > wrote: > >> >> > Op 11 augustus 2016 om 15:17 schreef Sean Sullivan < >> [email protected]>: >> > >> > >> > Hello Wido, >> > >> > Thanks for the advice. While the data center has a/b circuits and >> > redundant power, etc if a ground fault happens it travels outside and >> > fails causing the whole building to fail (apparently). >> > >> > The monitors are each the same with >> > 2x e5 cpus >> > 64gb of ram >> > 4x 300gb 10k SAS drives in raid 10 (write through mode). >> > Ubuntu 14.04 with the latest updates prior to power failure >> (2016/Aug/10 - >> > 3am CST) >> > Ceph hammer LTS 0.94.7 >> > >> > (we are still working on our jewel test cluster so it is planned but >> not in >> > place yet) >> > >> > The only thing that seems to be corrupt is the monitors leveldb store. >> I >> > see multiple issues on Google leveldb github from March 2016 about fsync >> > and power failure so I assume this is an issue with leveldb. >> > >> > I have backed up /var/lib/ceph/Mon on all of my monitors before trying >> to >> > proceed with any form of recovery. >> > >> > Is there any way to reconstruct the leveldb or replace the monitors and >> > recover the data? >> > >> I don't know. I have never done it. Other people might know this better >> than me. >> >> Maybe 'ceph-monstore-tool' can help you? >> >> Wido >> >> > I found the following post in which sage says it is tedious but >> possible. ( >> > http://www.spinics.net/lists/ceph-devel/msg06662.html). Tedious is >> fine if >> > I have any chance of doing it. I have the fsid, the Mon key map and >> all of >> > the osds look to be fine so all of the previous osd maps are there. >> > >> > I just don't understand what key/values I need inside. >> > >> > On Aug 11, 2016 1:33 AM, "Wido den Hollander" <[email protected]> wrote: >> > >> > > >> > > > Op 11 augustus 2016 om 0:10 schreef Sean Sullivan < >> > > [email protected]>: >> > > > >> > > > >> > > > I think it just got worse:: >> > > > >> > > > all three monitors on my other cluster say that ceph-mon can't open >> > > > /var/lib/ceph/mon/$(hostname). Is there any way to recover if you >> lose >> > > all >> > > > 3 monitors? I saw a post by Sage saying that the data can be >> recovered as >> > > > all of the data is held on other servers. Is this possible? If so >> has >> > > > anyone had any experience doing so? >> > > >> > > I have never done so, so I couldn't tell you. >> > > >> > > However, it is weird that on all three it got corrupted. What >> hardware are >> > > you using? Was it properly protected against power failure? >> > > >> > > If you mon store is corrupted I'm not sure what might happen. >> > > >> > > However, make a backup of ALL monitors right now before doing >> anything. >> > > >> > > Wido >> > > >> > > > _______________________________________________ >> > > > ceph-users mailing list >> > > > [email protected] >> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >> > > > > -- > - Sean: I wrote this. - > -- - Sean: I wrote this. -
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
