Re: [ceph-users] Power failure recovery woes (fwd)

Jeff Fri, 20 Feb 2015 10:40:57 -0800

Should I infer from the silence that there is no way to recover from the

        "FAILED assert(last_e.version.version < e.version.version)" errors?


Thanks,
        Jeff

----- Forwarded message from Jeff <[email protected]> -----

Date: Tue, 17 Feb 2015 09:16:33 -0500
From: Jeff <[email protected]>
To: [email protected]
Subject: Re: [ceph-users] Power failure recovery woes

Some additional information/questions:

Here is the output of "ceph osd tree"

Some of the "down" OSD's are actually running, but are "down". For example
osd.1:

        root     30158  8.6 12.7 1542860 781288 ?      Ssl 07:47   4:40
/usr/bin/ceph-osd --cluster=ceph -i 0 -f

 Is there any way to get the cluster to recognize them as being up?  osd-1 has
the "FAILED assert(last_e.version.version < e.version.version)" errors.

Thanks,
             Jeff


# id    weight  type name       up/down reweight
-1      10.22   root default
-2      2.72            host ceph1
0       0.91                    osd.0   up      1
1       0.91                    osd.1   down    0
2       0.9                     osd.2   down    0
-3      1.82            host ceph2
3       0.91                    osd.3   down    0
4       0.91                    osd.4   down    0
-4      2.04            host ceph3
5       0.68                    osd.5   up      1
6       0.68                    osd.6   up      1
7       0.68                    osd.7   up      1
8       0.68                    osd.8   down    0
-5      1.82            host ceph4
9       0.91                    osd.9   up      1
10      0.91                    osd.10  down    0
-6      1.82            host ceph5
11      0.91                    osd.11  up      1
12      0.91                    osd.12  up      1

On 2/17/2015 8:28 AM, Jeff wrote:
> 
> 
> -------- Original Message --------
> Subject: Re: [ceph-users] Power failure recovery woes
> Date: 2015-02-17 04:23
> From: Udo Lembke <[email protected]>
> To: Jeff <[email protected]>, [email protected]
> 
> Hi Jeff,
> is the osd /var/lib/ceph/osd/ceph-2 mounted?
> 
> If not, does it helps, if you mounted the osd and start with
> service ceph start osd.2
> ??
> 
> Udo
> 
> Am 17.02.2015 09:54, schrieb Jeff:
>> Hi,
>> 
>> We had a nasty power failure yesterday and even with UPS's our small (5
>> node, 12 OSD) cluster is having problems recovering.
>> 
>> We are running ceph 0.87
>> 
>> 3 of our OSD's are down consistently (others stop and are restartable,
>> but our cluster is so slow that almost everything we do times out).
>> 
>> We are seeing errors like this on the OSD's that never run:
>> 
>>     ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
>> Operation not permitted
>> 
>> We are seeing errors like these of the OSD's that run some of the time:
>> 
>>     osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
>> e.version.version)
>>     common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide
>> timeout")
>> 
>> Does anyone have any suggestions on how to recover our cluster?
>> 
>> Thanks!
>>           Jeff
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

----- End forwarded message -----

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Power failure recovery woes (fwd)

Reply via email to