Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

Sean Sullivan Sat, 13 Aug 2016 20:38:04 -0700

So with a patched leveldb to skip errors I now have a store.db that I can
extract the pg,mon,and osd map from. That said when I try to start kh10-8
it bombs out::


---------------------------------------
---------------------------------------
root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8# ceph-mon -i $(hostname) -d
2016-08-13 22:30:54.596039 7fa8b9e088c0  0 ceph version 0.94.7
(d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 708653
starting mon.kh10-8 rank 2 at 10.64.64.125:6789/0 mon_data
/var/lib/ceph/mon/ceph-kh10-8 fsid e452874b-cb29-4468-ac7f-f8901dfccebf
2016-08-13 22:30:54.608150 7fa8b9e088c0  0 starting mon.kh10-8 rank 2 at
10.64.64.125:6789/0 mon_data /var/lib/ceph/mon/ceph-kh10-8 fsid
e452874b-cb29-4468-ac7f-f8901dfccebf
2016-08-13 22:30:54.608395 7fa8b9e088c0  1 mon.kh10-8@-1(probing) e1
preinit fsid e452874b-cb29-4468-ac7f-f8901dfccebf
2016-08-13 22:30:54.608617 7fa8b9e088c0  1
mon.kh10-8@-1(probing).paxosservice(pgmap
0..35606392) refresh upgraded, format 0 -> 1
2016-08-13 22:30:54.608629 7fa8b9e088c0  1 mon.kh10-8@-1(probing).pg v0
on_upgrade discarding in-core PGMap
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer
*** Caught signal (Aborted) **
 in thread 7fa8b9e088c0
 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: ceph-mon() [0x9b25ea]
 2: (()+0x10330) [0x7fa8b8f0b330]
 3: (gsignal()+0x37) [0x7fa8b73a8c37]
 4: (abort()+0x148) [0x7fa8b73ac028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535]
 6: (()+0x5e6d6) [0x7fa8b7cb16d6]
 7: (()+0x5e703) [0x7fa8b7cb1703]
 8: (()+0x5e922) [0x7fa8b7cb1922]
 9: ceph-mon() [0x853c39]
 10:
(object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167)
[0x894227]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3]
 13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8]
 14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7]
 15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a]
 16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb]
 17: (Monitor::init_paxos()+0x85) [0x5b2365]
 18: (Monitor::preinit()+0x7d7) [0x5b6f87]
 19: (main()+0x230c) [0x57853c]
 20: (__libc_start_main()+0xf5) [0x7fa8b7393f45]
 21: ceph-mon() [0x59a3c7]
2016-08-13 22:30:54.611791 7fa8b9e088c0 -1 *** Caught signal (Aborted) **
 in thread 7fa8b9e088c0

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: ceph-mon() [0x9b25ea]
 2: (()+0x10330) [0x7fa8b8f0b330]
 3: (gsignal()+0x37) [0x7fa8b73a8c37]
 4: (abort()+0x148) [0x7fa8b73ac028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535]
 6: (()+0x5e6d6) [0x7fa8b7cb16d6]
 7: (()+0x5e703) [0x7fa8b7cb1703]
 8: (()+0x5e922) [0x7fa8b7cb1922]
 9: ceph-mon() [0x853c39]
 10:
(object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167)
[0x894227]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3]
 13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8]
 14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7]
 15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a]
 16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb]
 17: (Monitor::init_paxos()+0x85) [0x5b2365]
 18: (Monitor::preinit()+0x7d7) [0x5b6f87]
 19: (main()+0x230c) [0x57853c]
 20: (__libc_start_main()+0xf5) [0x7fa8b7393f45]
 21: ceph-mon() [0x59a3c7]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.

--- begin dump of recent events ---
   -33> 2016-08-13 22:30:54.593450 7fa8b9e088c0  5 asok(0x36a20f0)
register_command perfcounters_dump hook 0x365a050
   -32> 2016-08-13 22:30:54.593480 7fa8b9e088c0  5 asok(0x36a20f0)
register_command 1 hook 0x365a050
   -31> 2016-08-13 22:30:54.593486 7fa8b9e088c0  5 asok(0x36a20f0)
register_command perf dump hook 0x365a050
   -30> 2016-08-13 22:30:54.593496 7fa8b9e088c0  5 asok(0x36a20f0)
register_command perfcounters_schema hook 0x365a050
   -29> 2016-08-13 22:30:54.593499 7fa8b9e088c0  5 asok(0x36a20f0)
register_command 2 hook 0x365a050
   -28> 2016-08-13 22:30:54.593501 7fa8b9e088c0  5 asok(0x36a20f0)
register_command perf schema hook 0x365a050
   -27> 2016-08-13 22:30:54.593503 7fa8b9e088c0  5 asok(0x36a20f0)
register_command perf reset hook 0x365a050
   -26> 2016-08-13 22:30:54.593505 7fa8b9e088c0  5 asok(0x36a20f0)
register_command config show hook 0x365a050
   -25> 2016-08-13 22:30:54.593508 7fa8b9e088c0  5 asok(0x36a20f0)
register_command config set hook 0x365a050
   -24> 2016-08-13 22:30:54.593510 7fa8b9e088c0  5 asok(0x36a20f0)
register_command config get hook 0x365a050
   -23> 2016-08-13 22:30:54.593512 7fa8b9e088c0  5 asok(0x36a20f0)
register_command config diff hook 0x365a050
   -22> 2016-08-13 22:30:54.593513 7fa8b9e088c0  5 asok(0x36a20f0)
register_command log flush hook 0x365a050
   -21> 2016-08-13 22:30:54.593557 7fa8b9e088c0  5 asok(0x36a20f0)
register_command log dump hook 0x365a050
   -20> 2016-08-13 22:30:54.593561 7fa8b9e088c0  5 asok(0x36a20f0)
register_command log reopen hook 0x365a050
   -19> 2016-08-13 22:30:54.596039 7fa8b9e088c0  0 ceph version 0.94.7
(d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 708653
   -18> 2016-08-13 22:30:54.597587 7fa8b9e088c0  5 asok(0x36a20f0) init
/var/run/ceph/ceph-mon.kh10-8.asok
   -17> 2016-08-13 22:30:54.597601 7fa8b9e088c0  5 asok(0x36a20f0)
bind_and_listen /var/run/ceph/ceph-mon.kh10-8.asok
   -16> 2016-08-13 22:30:54.597767 7fa8b9e088c0  5 asok(0x36a20f0)
register_command 0 hook 0x36560c0
   -15> 2016-08-13 22:30:54.597775 7fa8b9e088c0  5 asok(0x36a20f0)
register_command version hook 0x36560c0
   -14> 2016-08-13 22:30:54.597778 7fa8b9e088c0  5 asok(0x36a20f0)
register_command git_version hook 0x36560c0
   -13> 2016-08-13 22:30:54.597781 7fa8b9e088c0  5 asok(0x36a20f0)
register_command help hook 0x365a150
   -12> 2016-08-13 22:30:54.597783 7fa8b9e088c0  5 asok(0x36a20f0)
register_command get_command_descriptions hook 0x365a140
   -11> 2016-08-13 22:30:54.597860 7fa8b5181700  5 asok(0x36a20f0) entry
start
   -10> 2016-08-13 22:30:54.608150 7fa8b9e088c0  0 starting mon.kh10-8 rank
2 at 10.64.64.125:6789/0 mon_data /var/lib/ceph/mon/ceph-kh10-8 fsid
e452874b-cb29-4468-ac7f-f8901dfccebf
    -9> 2016-08-13 22:30:54.608210 7fa8b9e088c0  1 -- 10.64.64.125:6789/0
learned my addr 10.64.64.125:6789/0
    -8> 2016-08-13 22:30:54.608214 7fa8b9e088c0  1 accepter.accepter.bind
my_inst.addr is 10.64.64.125:6789/0 need_addr=0
    -7> 2016-08-13 22:30:54.608279 7fa8b9e088c0  5 adding auth protocol:
cephx
    -6> 2016-08-13 22:30:54.608282 7fa8b9e088c0  5 adding auth protocol:
cephx
    -5> 2016-08-13 22:30:54.608311 7fa8b9e088c0 10 log_channel(cluster)
update_config to_monitors: true to_syslog: false syslog_facility: daemon
prio: info)
    -4> 2016-08-13 22:30:54.608317 7fa8b9e088c0 10 log_channel(audit)
update_config to_monitors: true to_syslog: false syslog_facility: local0
prio: info)
    -3> 2016-08-13 22:30:54.608395 7fa8b9e088c0  1 mon.kh10-8@-1(probing)
e1 preinit fsid e452874b-cb29-4468-ac7f-f8901dfccebf
    -2> 2016-08-13 22:30:54.608617 7fa8b9e088c0  1
mon.kh10-8@-1(probing).paxosservice(pgmap
0..35606392) refresh upgraded, format 0 -> 1
    -1> 2016-08-13 22:30:54.608629 7fa8b9e088c0  1 mon.kh10-8@-1(probing).pg
v0 on_upgrade discarding in-core PGMap
     0> 2016-08-13 22:30:54.611791 7fa8b9e088c0 -1 *** Caught signal
(Aborted) **
 in thread 7fa8b9e088c0

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: ceph-mon() [0x9b25ea]
 2: (()+0x10330) [0x7fa8b8f0b330]
 3: (gsignal()+0x37) [0x7fa8b73a8c37]
 4: (abort()+0x148) [0x7fa8b73ac028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535]
 6: (()+0x5e6d6) [0x7fa8b7cb16d6]
 7: (()+0x5e703) [0x7fa8b7cb1703]
 8: (()+0x5e922) [0x7fa8b7cb1922]
 9: ceph-mon() [0x853c39]
 10:
(object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167)
[0x894227]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3]
 13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8]
 14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7]
 15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a]
 16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb]
 17: (Monitor::init_paxos()+0x85) [0x5b2365]
 18: (Monitor::preinit()+0x7d7) [0x5b6f87]
 19: (main()+0x230c) [0x57853c]
 20: (__libc_start_main()+0xf5) [0x7fa8b7393f45]
 21: ceph-mon() [0x59a3c7]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file
--- end dump of recent events ---
Aborted (core dumped)
---------------------------------------
---------------------------------------

I feel like I am so close but so far. Can anyone give me a nudge as to what
I can do next? it looks like it is bombing out on trying to get an updated
paxos.



On Fri, Aug 12, 2016 at 1:09 PM, Sean Sullivan <seapasu...@uchicago.edu>
wrote:

> A coworker patched leveldb and we were able to export quite a bit of data
> from kh08's leveldb database. At this point I think I need to re-construct
> a new leveldb with whatever values I can. Is it the same leveldb database
> across all 3 montiors? IE will keys exported from one work in the other?
> All should have the same keys/values although constructed differently
> right? I can't blindly copy /var/lib/ceph/mon/ceph-$(hostname)/store.db/
> from one host to another right? But can I copy the keys/values from one to
> another?
>
> On Fri, Aug 12, 2016 at 12:45 PM, Sean Sullivan <seapasu...@uchicago.edu>
> wrote:
>
>> ceph-monstore-tool? Is that the same as monmaptool? oops! NM found it in
>> ceph-test package::
>>
>> I can't seem to get it working :-( dump monmap or any of the commands.
>> They all bomb out with the same message:
>>
>> root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool
>> /var/lib/ceph/mon/ceph-kh10-8 dump-trace -- /tmp/test.trace
>> Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
>> store.db/10882319.ldb
>> root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool
>> /var/lib/ceph/mon/ceph-kh10-8 dump-keys
>> Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
>> store.db/10882319.ldb
>>
>>
>> I need to clarify as I originally had 2 clusters with this issue and now
>> I have 1 with all 3 monitors dead and 1 that I was successfully able to
>> repair. I am about to recap everything I know about the issue and the issue
>> at hand. Should I start a new email thread about this instead?
>>
>> The cluster that is currently having issues is on hammer (94.7), and the
>> monitor stats are the same::
>> root@kh08-8:~# cat /proc/cpuinfo | grep -iE "model name" | uniq -c
>>      24 model name : Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
>>      ext4 volume comprised of 4x300GB 10k drives in raid 10.
>>      ubuntu 14.04
>>
>> root@kh08-8:~# uname -a
>> Linux kh08-8 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>> root@kh08-8:~# ceph --version
>> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>>
>>
>> From here: Here are the errors I am getting when starting each of the
>> monitors::
>>
>>
>> ---------------
>> root@kh08-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh08-8 -d
>> 2016-08-11 22:15:23.731550 7fe5ad3e98c0  0 ceph version 0.94.7
>> (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 317309
>> Corruption: error in middle of record
>> 2016-08-11 22:15:28.274340 7fe5ad3e98c0 -1 error opening mon data
>> directory at '/var/lib/ceph/mon/ceph-kh08-8': (22) Invalid argument
>> --
>> root@kh09-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh09-8 -d
>> 2016-08-11 22:14:28.252370 7f7eaab908c0  0 ceph version 0.94.7
>> (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 308888
>> Corruption: 14 missing files; e.g.: /var/lib/ceph/mon/ceph-kh09-8/
>> store.db/10845998.ldb
>> 2016-08-11 22:14:35.094237 7f7eaab908c0 -1 error opening mon data
>> directory at '/var/lib/ceph/mon/ceph-kh09-8': (22) Invalid argument
>> --
>> root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# /usr/bin/ceph-mon
>> --cluster=ceph -i kh10-8 -d
>> 2016-08-11 22:17:54.632762 7f80bf34d8c0  0 ceph version 0.94.7
>> (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 292620
>> Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/
>> store.db/10882319.ldb
>> 2016-08-11 22:18:01.207749 7f80bf34d8c0 -1 error opening mon data
>> directory at '/var/lib/ceph/mon/ceph-kh10-8': (22) Invalid argument
>> ---------------
>>
>>
>> for kh08, a coworker patched leveldb to print and skip on the first error
>> and that one is also missing a bunch of files. As such I think kh10-8 is my
>> most likely candidate to recover but either way recovery is probably not an
>> option. I see leveldb has a repair.cc (https://github.com/google/lev
>> eldb/blob/master/db/repair.cc)) but I do not see repair mentioned in
>> monitor in respect to the dbstore. I tried using the leveldb python module
>> (plyvel) to attempt a repair but my repl just ends up dying.
>>
>> I understand two things:: 1.) Without rebuilding the monitor backend
>> leveldb (the cluster map as I understand it) store all of the data in the
>> cluster is essentialy lost (right?)
>>                                          2.) it is possible to rebuild
>> this database via some form of magic or (source)ry as all of this data is
>> essential held throughout the cluster as well.
>>
>> We only use radosgw / S3 for this cluster. If there is a way to recover
>> my data that is easier//more likely than rebuilding the leveldb of a
>> monitor and starting a single monitor cluster up I would like to switch
>> gears and focus on that.
>>
>> Looking at the dev docs:
>> http://docs.ceph.com/docs/hammer/architecture/#cluster-map
>> it has 5 main parts::
>>
>> ```
>> The Monitor Map: Contains the cluster fsid, the position, name address
>> and port of each monitor. It also indicates the current epoch, when the map
>> was created, and the last time it changed. To view a monitor map, execute
>> ceph mon dump.
>> The OSD Map: Contains the cluster fsid, when the map was created and last
>> modified, a list of pools, replica sizes, PG numbers, a list of OSDs and
>> their status (e.g., up, in). To view an OSD map, execute ceph osd dump.
>> The PG Map: Contains the PG version, its time stamp, the last OSD map
>> epoch, the full ratios, and details on each placement group such as the PG
>> ID, the Up Set, the Acting Set, the state of the PG (e.g., active + clean),
>> and data usage statistics for each pool.
>> The CRUSH Map: Contains a list of storage devices, the failure domain
>> hierarchy (e.g., device, host, rack, row, room, etc.), and rules for
>> traversing the hierarchy when storing data. To view a CRUSH map, execute
>> ceph osd getcrushmap -o {filename}; then, decompile it by executing
>> crushtool -d {comp-crushmap-filename} -o {decomp-crushmap-filename}. You
>> can view the decompiled map in a text editor or with cat.
>> The MDS Map: Contains the current MDS map epoch, when the map was
>> created, and the last time it changed. It also contains the pool for
>> storing metadata, a list of metadata servers, and which metadata servers
>> are up and in. To view an MDS map, execute ceph mds dump.
>> ```
>>
>> As we don't use cephfs mds can essentially be blank(right) so I am left
>> with 4 valid maps needed to get a working cluster again. I don't see auth
>> mentioned in there but that too.  Then I just need to rebuild the leveldb
>> database somehow with the right information and I should be good. So long
>> long long journey ahead.
>>
>> I don't think that the data is stored in strings or json, right? Am I
>> going down the wrong path here? Is there a shorter/simpler path to retrieve
>> the data from a cluster that lost all 3 monitors in power falure? If I am
>> going down the right path is there any advice on how I can assemble/repair
>> the database?
>>
>> I see that there is a rbd recovery from a dead cluster tool. Is it
>> possible to do the same with s3 objects?
>>
>> On Thu, Aug 11, 2016 at 11:15 AM, Wido den Hollander <w...@42on.com>
>> wrote:
>>
>>>
>>> > Op 11 augustus 2016 om 15:17 schreef Sean Sullivan <
>>> seapasu...@uchicago.edu>:
>>> >
>>> >
>>> > Hello Wido,
>>> >
>>> > Thanks for the advice.  While the data center has a/b circuits and
>>> > redundant power, etc if a ground fault happens it  travels outside and
>>> > fails causing the whole building to fail (apparently).
>>> >
>>> > The monitors are each the same with
>>> > 2x e5 cpus
>>> > 64gb of ram
>>> > 4x 300gb 10k SAS drives in raid 10 (write through mode).
>>> > Ubuntu 14.04 with the latest updates prior to power failure
>>> (2016/Aug/10 -
>>> > 3am CST)
>>> > Ceph hammer LTS 0.94.7
>>> >
>>> > (we are still working on our jewel test cluster so it is planned but
>>> not in
>>> > place yet)
>>> >
>>> > The only thing that seems to be corrupt is the monitors leveldb
>>> store.  I
>>> > see multiple issues on Google leveldb github from March 2016 about
>>> fsync
>>> > and power failure so I assume this is an issue with leveldb.
>>> >
>>> > I have backed up /var/lib/ceph/Mon on all of my monitors before trying
>>> to
>>> > proceed with any form of recovery.
>>> >
>>> > Is there any way to reconstruct the leveldb or replace the monitors and
>>> > recover the data?
>>> >
>>> I don't know. I have never done it. Other people might know this better
>>> than me.
>>>
>>> Maybe 'ceph-monstore-tool' can help you?
>>>
>>> Wido
>>>
>>> > I found the following post in which sage says it is tedious but
>>> possible. (
>>> > http://www.spinics.net/lists/ceph-devel/msg06662.html). Tedious is
>>> fine if
>>> > I have any chance of doing it.  I have the fsid, the Mon key map and
>>> all of
>>> > the osds look to be fine so all of the previous osd maps  are there.
>>> >
>>> > I just don't understand what key/values I need inside.
>>> >
>>> > On Aug 11, 2016 1:33 AM, "Wido den Hollander" <w...@42on.com> wrote:
>>> >
>>> > >
>>> > > > Op 11 augustus 2016 om 0:10 schreef Sean Sullivan <
>>> > > seapasu...@uchicago.edu>:
>>> > > >
>>> > > >
>>> > > > I think it just got worse::
>>> > > >
>>> > > > all three monitors on my other cluster say that ceph-mon can't open
>>> > > > /var/lib/ceph/mon/$(hostname). Is there any way to recover if you
>>> lose
>>> > > all
>>> > > > 3 monitors? I saw a post by Sage saying that the data can be
>>> recovered as
>>> > > > all of the data is held on other servers. Is this possible? If so
>>> has
>>> > > > anyone had any experience doing so?
>>> > >
>>> > > I have never done so, so I couldn't tell you.
>>> > >
>>> > > However, it is weird that on all three it got corrupted. What
>>> hardware are
>>> > > you using? Was it properly protected against power failure?
>>> > >
>>> > > If you mon store is corrupted I'm not sure what might happen.
>>> > >
>>> > > However, make a backup of ALL monitors right now before doing
>>> anything.
>>> > >
>>> > > Wido
>>> > >
>>> > > > _______________________________________________
>>> > > > ceph-users mailing list
>>> > > > ceph-users@lists.ceph.com
>>> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> > >
>>>
>>
>>
>>
>> --
>> - Sean:  I wrote this. -
>>
>
>
>
> --
> - Sean:  I wrote this. -
>



-- 
- Sean:  I wrote this. -

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

Reply via email to