Should I infer from the silence that there is no way to recover from the
"FAILED assert(last_e.version.version < e.version.version)" errors?
Thanks,
Jeff
----- Forwarded message from Jeff <[email protected]> -----
Date: Tue, 17 Feb 2015 09:16:33 -0500
From: Jeff <[email protected]>
To: [email protected]
Subject: Re: [ceph-users] Power failure recovery woes
Some additional information/questions:
Here is the output of "ceph osd tree"
Some of the "down" OSD's are actually running, but are "down". For example
osd.1:
root 30158 8.6 12.7 1542860 781288 ? Ssl 07:47 4:40
/usr/bin/ceph-osd --cluster=ceph -i 0 -f
Is there any way to get the cluster to recognize them as being up? osd-1 has
the "FAILED assert(last_e.version.version < e.version.version)" errors.
Thanks,
Jeff
# id weight type name up/down reweight
-1 10.22 root default
-2 2.72 host ceph1
0 0.91 osd.0 up 1
1 0.91 osd.1 down 0
2 0.9 osd.2 down 0
-3 1.82 host ceph2
3 0.91 osd.3 down 0
4 0.91 osd.4 down 0
-4 2.04 host ceph3
5 0.68 osd.5 up 1
6 0.68 osd.6 up 1
7 0.68 osd.7 up 1
8 0.68 osd.8 down 0
-5 1.82 host ceph4
9 0.91 osd.9 up 1
10 0.91 osd.10 down 0
-6 1.82 host ceph5
11 0.91 osd.11 up 1
12 0.91 osd.12 up 1
On 2/17/2015 8:28 AM, Jeff wrote:
>
>
> -------- Original Message --------
> Subject: Re: [ceph-users] Power failure recovery woes
> Date: 2015-02-17 04:23
> From: Udo Lembke <[email protected]>
> To: Jeff <[email protected]>, [email protected]
>
> Hi Jeff,
> is the osd /var/lib/ceph/osd/ceph-2 mounted?
>
> If not, does it helps, if you mounted the osd and start with
> service ceph start osd.2
> ??
>
> Udo
>
> Am 17.02.2015 09:54, schrieb Jeff:
>> Hi,
>>
>> We had a nasty power failure yesterday and even with UPS's our small (5
>> node, 12 OSD) cluster is having problems recovering.
>>
>> We are running ceph 0.87
>>
>> 3 of our OSD's are down consistently (others stop and are restartable,
>> but our cluster is so slow that almost everything we do times out).
>>
>> We are seeing errors like this on the OSD's that never run:
>>
>> ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
>> Operation not permitted
>>
>> We are seeing errors like these of the OSD's that run some of the time:
>>
>> osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
>> e.version.version)
>> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide
>> timeout")
>>
>> Does anyone have any suggestions on how to recover our cluster?
>>
>> Thanks!
>> Jeff
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
----- End forwarded message -----
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com