Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

Sean Sullivan Fri, 12 Aug 2016 10:46:50 -0700

ceph-monstore-tool? Is that the same as monmaptool? oops! NM found it in
ceph-test package::

I can't seem to get it working :-( dump monmap or any of the commands. They
all bomb out with the same message:

root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool
/var/lib/ceph/mon/ceph-kh10-8 dump-trace -- /tmp/test.trace
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
store.db/10882319.ldb
root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool
/var/lib/ceph/mon/ceph-kh10-8 dump-keys
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
store.db/10882319.ldb

I need to clarify as I originally had 2 clusters with this issue and now I
have 1 with all 3 monitors dead and 1 that I was successfully able to
repair. I am about to recap everything I know about the issue and the issue
at hand. Should I start a new email thread about this instead?

The cluster that is currently having issues is on hammer (94.7), and the
monitor stats are the same::
root@kh08-8:~# cat /proc/cpuinfo | grep -iE "model name" | uniq -c
     24 model name : Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
     ext4 volume comprised of 4x300GB 10k drives in raid 10.
     ubuntu 14.04

root@kh08-8:~# uname -a
Linux kh08-8 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
root@kh08-8:~# ceph --version
ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)

>From here: Here are the errors I am getting when starting each of the
monitors::

---------------
root@kh08-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh08-8 -d
2016-08-11 22:15:23.731550 7fe5ad3e98c0  0 ceph version 0.94.7
(d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 317309
Corruption: error in middle of record
2016-08-11 22:15:28.274340 7fe5ad3e98c0 -1 error opening mon data directory
at '/var/lib/ceph/mon/ceph-kh08-8': (22) Invalid argument
--
root@kh09-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh09-8 -d
2016-08-11 22:14:28.252370 7f7eaab908c0  0 ceph version 0.94.7
(d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 308888
Corruption: 14 missing files; e.g.: /var/lib/ceph/mon/ceph-kh09-8/
store.db/10845998.ldb
2016-08-11 22:14:35.094237 7f7eaab908c0 -1 error opening mon data directory
at '/var/lib/ceph/mon/ceph-kh09-8': (22) Invalid argument
--
root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# /usr/bin/ceph-mon
--cluster=ceph -i kh10-8 -d
2016-08-11 22:17:54.632762 7f80bf34d8c0  0 ceph version 0.94.7
(d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 292620
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
store.db/10882319.ldb
2016-08-11 22:18:01.207749 7f80bf34d8c0 -1 error opening mon data directory
at '/var/lib/ceph/mon/ceph-kh10-8': (22) Invalid argument
---------------

for kh08, a coworker patched leveldb to print and skip on the first error
and that one is also missing a bunch of files. As such I think kh10-8 is my
most likely candidate to recover but either way recovery is probably not an
option. I see leveldb has a repair.cc (https://github.com/google/lev
eldb/blob/master/db/repair.cc)) but I do not see repair mentioned in
monitor in respect to the dbstore. I tried using the leveldb python module
(plyvel) to attempt a repair but my repl just ends up dying.

I understand two things:: 1.) Without rebuilding the monitor backend
leveldb (the cluster map as I understand it) store all of the data in the
cluster is essentialy lost (right?)
                                         2.) it is possible to rebuild this
database via some form of magic or (source)ry as all of this data is
essential held throughout the cluster as well.

We only use radosgw / S3 for this cluster. If there is a way to recover my
data that is easier//more likely than rebuilding the leveldb of a monitor
and starting a single monitor cluster up I would like to switch gears and
focus on that.

Looking at the dev docs:
http://docs.ceph.com/docs/hammer/architecture/#cluster-map
it has 5 main parts::

```
The Monitor Map: Contains the cluster fsid, the position, name address and
port of each monitor. It also indicates the current epoch, when the map was
created, and the last time it changed. To view a monitor map, execute ceph
mon dump.
The OSD Map: Contains the cluster fsid, when the map was created and last
modified, a list of pools, replica sizes, PG numbers, a list of OSDs and
their status (e.g., up, in). To view an OSD map, execute ceph osd dump.
The PG Map: Contains the PG version, its time stamp, the last OSD map
epoch, the full ratios, and details on each placement group such as the PG
ID, the Up Set, the Acting Set, the state of the PG (e.g., active + clean),
and data usage statistics for each pool.
The CRUSH Map: Contains a list of storage devices, the failure domain
hierarchy (e.g., device, host, rack, row, room, etc.), and rules for
traversing the hierarchy when storing data. To view a CRUSH map, execute
ceph osd getcrushmap -o {filename}; then, decompile it by executing
crushtool -d {comp-crushmap-filename} -o {decomp-crushmap-filename}. You
can view the decompiled map in a text editor or with cat.
The MDS Map: Contains the current MDS map epoch, when the map was created,
and the last time it changed. It also contains the pool for storing
metadata, a list of metadata servers, and which metadata servers are up and
in. To view an MDS map, execute ceph mds dump.
```

As we don't use cephfs mds can essentially be blank(right) so I am left
with 4 valid maps needed to get a working cluster again. I don't see auth
mentioned in there but that too.  Then I just need to rebuild the leveldb
database somehow with the right information and I should be good. So long
long long journey ahead.

I don't think that the data is stored in strings or json, right? Am I going
down the wrong path here? Is there a shorter/simpler path to retrieve the
data from a cluster that lost all 3 monitors in power falure? If I am going
down the right path is there any advice on how I can assemble/repair the
database?

I see that there is a rbd recovery from a dead cluster tool. Is it possible
to do the same with s3 objects?

On Thu, Aug 11, 2016 at 11:15 AM, Wido den Hollander <[email protected]> wrote:

>
> > Op 11 augustus 2016 om 15:17 schreef Sean Sullivan <
> [email protected]>:
> >
> >
> > Hello Wido,
> >
> > Thanks for the advice.  While the data center has a/b circuits and
> > redundant power, etc if a ground fault happens it  travels outside and
> > fails causing the whole building to fail (apparently).
> >
> > The monitors are each the same with
> > 2x e5 cpus
> > 64gb of ram
> > 4x 300gb 10k SAS drives in raid 10 (write through mode).
> > Ubuntu 14.04 with the latest updates prior to power failure (2016/Aug/10
> -
> > 3am CST)
> > Ceph hammer LTS 0.94.7
> >
> > (we are still working on our jewel test cluster so it is planned but not
> in
> > place yet)
> >
> > The only thing that seems to be corrupt is the monitors leveldb store.  I
> > see multiple issues on Google leveldb github from March 2016 about fsync
> > and power failure so I assume this is an issue with leveldb.
> >
> > I have backed up /var/lib/ceph/Mon on all of my monitors before trying to
> > proceed with any form of recovery.
> >
> > Is there any way to reconstruct the leveldb or replace the monitors and
> > recover the data?
> >
> I don't know. I have never done it. Other people might know this better
> than me.
>
> Maybe 'ceph-monstore-tool' can help you?
>
> Wido
>
> > I found the following post in which sage says it is tedious but
> possible. (
> > http://www.spinics.net/lists/ceph-devel/msg06662.html). Tedious is fine
> if
> > I have any chance of doing it.  I have the fsid, the Mon key map and all
> of
> > the osds look to be fine so all of the previous osd maps  are there.
> >
> > I just don't understand what key/values I need inside.
> >
> > On Aug 11, 2016 1:33 AM, "Wido den Hollander" <[email protected]> wrote:
> >
> > >
> > > > Op 11 augustus 2016 om 0:10 schreef Sean Sullivan <
> > > [email protected]>:
> > > >
> > > >
> > > > I think it just got worse::
> > > >
> > > > all three monitors on my other cluster say that ceph-mon can't open
> > > > /var/lib/ceph/mon/$(hostname). Is there any way to recover if you
> lose
> > > all
> > > > 3 monitors? I saw a post by Sage saying that the data can be
> recovered as
> > > > all of the data is held on other servers. Is this possible? If so has
> > > > anyone had any experience doing so?
> > >
> > > I have never done so, so I couldn't tell you.
> > >
> > > However, it is weird that on all three it got corrupted. What hardware
> are
> > > you using? Was it properly protected against power failure?
> > >
> > > If you mon store is corrupted I'm not sure what might happen.
> > >
> > > However, make a backup of ALL monitors right now before doing anything.
> > >
> > > Wido
> > >
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > [email protected]
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>

-- 
- Sean:  I wrote this. -

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

Reply via email to