A coworker patched leveldb and we were able to export quite a bit of data
from kh08's leveldb database. At this point I think I need to re-construct
a new leveldb with whatever values I can. Is it the same leveldb database
across all 3 montiors? IE will keys exported from one work in the other?
All should have the same keys/values although constructed differently
right? I can't blindly copy /var/lib/ceph/mon/ceph-$(hostname)/store.db/
from one host to another right? But can I copy the keys/values from one to
another?

On Fri, Aug 12, 2016 at 12:45 PM, Sean Sullivan <[email protected]>
wrote:

> ceph-monstore-tool? Is that the same as monmaptool? oops! NM found it in
> ceph-test package::
>
> I can't seem to get it working :-( dump monmap or any of the commands.
> They all bomb out with the same message:
>
> root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool
> /var/lib/ceph/mon/ceph-kh10-8 dump-trace -- /tmp/test.trace
> Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
> store.db/10882319.ldb
> root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool
> /var/lib/ceph/mon/ceph-kh10-8 dump-keys
> Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
> store.db/10882319.ldb
>
>
> I need to clarify as I originally had 2 clusters with this issue and now I
> have 1 with all 3 monitors dead and 1 that I was successfully able to
> repair. I am about to recap everything I know about the issue and the issue
> at hand. Should I start a new email thread about this instead?
>
> The cluster that is currently having issues is on hammer (94.7), and the
> monitor stats are the same::
> root@kh08-8:~# cat /proc/cpuinfo | grep -iE "model name" | uniq -c
>      24 model name : Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
>      ext4 volume comprised of 4x300GB 10k drives in raid 10.
>      ubuntu 14.04
>
> root@kh08-8:~# uname -a
> Linux kh08-8 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
> root@kh08-8:~# ceph --version
> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>
>
> From here: Here are the errors I am getting when starting each of the
> monitors::
>
>
> ---------------
> root@kh08-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh08-8 -d
> 2016-08-11 22:15:23.731550 7fe5ad3e98c0  0 ceph version 0.94.7
> (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 317309
> Corruption: error in middle of record
> 2016-08-11 22:15:28.274340 7fe5ad3e98c0 -1 error opening mon data
> directory at '/var/lib/ceph/mon/ceph-kh08-8': (22) Invalid argument
> --
> root@kh09-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh09-8 -d
> 2016-08-11 22:14:28.252370 7f7eaab908c0  0 ceph version 0.94.7
> (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 308888
> Corruption: 14 missing files; e.g.: /var/lib/ceph/mon/ceph-kh09-8/
> store.db/10845998.ldb
> 2016-08-11 22:14:35.094237 7f7eaab908c0 -1 error opening mon data
> directory at '/var/lib/ceph/mon/ceph-kh09-8': (22) Invalid argument
> --
> root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# /usr/bin/ceph-mon
> --cluster=ceph -i kh10-8 -d
> 2016-08-11 22:17:54.632762 7f80bf34d8c0  0 ceph version 0.94.7
> (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 292620
> Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
> store.db/10882319.ldb
> 2016-08-11 22:18:01.207749 7f80bf34d8c0 -1 error opening mon data
> directory at '/var/lib/ceph/mon/ceph-kh10-8': (22) Invalid argument
> ---------------
>
>
> for kh08, a coworker patched leveldb to print and skip on the first error
> and that one is also missing a bunch of files. As such I think kh10-8 is my
> most likely candidate to recover but either way recovery is probably not an
> option. I see leveldb has a repair.cc (https://github.com/google/lev
> eldb/blob/master/db/repair.cc)) but I do not see repair mentioned in
> monitor in respect to the dbstore. I tried using the leveldb python module
> (plyvel) to attempt a repair but my repl just ends up dying.
>
> I understand two things:: 1.) Without rebuilding the monitor backend
> leveldb (the cluster map as I understand it) store all of the data in the
> cluster is essentialy lost (right?)
>                                          2.) it is possible to rebuild
> this database via some form of magic or (source)ry as all of this data is
> essential held throughout the cluster as well.
>
> We only use radosgw / S3 for this cluster. If there is a way to recover my
> data that is easier//more likely than rebuilding the leveldb of a monitor
> and starting a single monitor cluster up I would like to switch gears and
> focus on that.
>
> Looking at the dev docs:
> http://docs.ceph.com/docs/hammer/architecture/#cluster-map
> it has 5 main parts::
>
> ```
> The Monitor Map: Contains the cluster fsid, the position, name address and
> port of each monitor. It also indicates the current epoch, when the map was
> created, and the last time it changed. To view a monitor map, execute ceph
> mon dump.
> The OSD Map: Contains the cluster fsid, when the map was created and last
> modified, a list of pools, replica sizes, PG numbers, a list of OSDs and
> their status (e.g., up, in). To view an OSD map, execute ceph osd dump.
> The PG Map: Contains the PG version, its time stamp, the last OSD map
> epoch, the full ratios, and details on each placement group such as the PG
> ID, the Up Set, the Acting Set, the state of the PG (e.g., active + clean),
> and data usage statistics for each pool.
> The CRUSH Map: Contains a list of storage devices, the failure domain
> hierarchy (e.g., device, host, rack, row, room, etc.), and rules for
> traversing the hierarchy when storing data. To view a CRUSH map, execute
> ceph osd getcrushmap -o {filename}; then, decompile it by executing
> crushtool -d {comp-crushmap-filename} -o {decomp-crushmap-filename}. You
> can view the decompiled map in a text editor or with cat.
> The MDS Map: Contains the current MDS map epoch, when the map was created,
> and the last time it changed. It also contains the pool for storing
> metadata, a list of metadata servers, and which metadata servers are up and
> in. To view an MDS map, execute ceph mds dump.
> ```
>
> As we don't use cephfs mds can essentially be blank(right) so I am left
> with 4 valid maps needed to get a working cluster again. I don't see auth
> mentioned in there but that too.  Then I just need to rebuild the leveldb
> database somehow with the right information and I should be good. So long
> long long journey ahead.
>
> I don't think that the data is stored in strings or json, right? Am I
> going down the wrong path here? Is there a shorter/simpler path to retrieve
> the data from a cluster that lost all 3 monitors in power falure? If I am
> going down the right path is there any advice on how I can assemble/repair
> the database?
>
> I see that there is a rbd recovery from a dead cluster tool. Is it
> possible to do the same with s3 objects?
>
> On Thu, Aug 11, 2016 at 11:15 AM, Wido den Hollander <[email protected]>
> wrote:
>
>>
>> > Op 11 augustus 2016 om 15:17 schreef Sean Sullivan <
>> [email protected]>:
>> >
>> >
>> > Hello Wido,
>> >
>> > Thanks for the advice.  While the data center has a/b circuits and
>> > redundant power, etc if a ground fault happens it  travels outside and
>> > fails causing the whole building to fail (apparently).
>> >
>> > The monitors are each the same with
>> > 2x e5 cpus
>> > 64gb of ram
>> > 4x 300gb 10k SAS drives in raid 10 (write through mode).
>> > Ubuntu 14.04 with the latest updates prior to power failure
>> (2016/Aug/10 -
>> > 3am CST)
>> > Ceph hammer LTS 0.94.7
>> >
>> > (we are still working on our jewel test cluster so it is planned but
>> not in
>> > place yet)
>> >
>> > The only thing that seems to be corrupt is the monitors leveldb store.
>> I
>> > see multiple issues on Google leveldb github from March 2016 about fsync
>> > and power failure so I assume this is an issue with leveldb.
>> >
>> > I have backed up /var/lib/ceph/Mon on all of my monitors before trying
>> to
>> > proceed with any form of recovery.
>> >
>> > Is there any way to reconstruct the leveldb or replace the monitors and
>> > recover the data?
>> >
>> I don't know. I have never done it. Other people might know this better
>> than me.
>>
>> Maybe 'ceph-monstore-tool' can help you?
>>
>> Wido
>>
>> > I found the following post in which sage says it is tedious but
>> possible. (
>> > http://www.spinics.net/lists/ceph-devel/msg06662.html). Tedious is
>> fine if
>> > I have any chance of doing it.  I have the fsid, the Mon key map and
>> all of
>> > the osds look to be fine so all of the previous osd maps  are there.
>> >
>> > I just don't understand what key/values I need inside.
>> >
>> > On Aug 11, 2016 1:33 AM, "Wido den Hollander" <[email protected]> wrote:
>> >
>> > >
>> > > > Op 11 augustus 2016 om 0:10 schreef Sean Sullivan <
>> > > [email protected]>:
>> > > >
>> > > >
>> > > > I think it just got worse::
>> > > >
>> > > > all three monitors on my other cluster say that ceph-mon can't open
>> > > > /var/lib/ceph/mon/$(hostname). Is there any way to recover if you
>> lose
>> > > all
>> > > > 3 monitors? I saw a post by Sage saying that the data can be
>> recovered as
>> > > > all of the data is held on other servers. Is this possible? If so
>> has
>> > > > anyone had any experience doing so?
>> > >
>> > > I have never done so, so I couldn't tell you.
>> > >
>> > > However, it is weird that on all three it got corrupted. What
>> hardware are
>> > > you using? Was it properly protected against power failure?
>> > >
>> > > If you mon store is corrupted I'm not sure what might happen.
>> > >
>> > > However, make a backup of ALL monitors right now before doing
>> anything.
>> > >
>> > > Wido
>> > >
>> > > > _______________________________________________
>> > > > ceph-users mailing list
>> > > > [email protected]
>> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >
>>
>
>
>
> --
> - Sean:  I wrote this. -
>



-- 
- Sean:  I wrote this. -
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to