2016-01-14 11:25 GMT+02:00 Magnus Hagdorn <[email protected]>:

> On 13/01/16 13:32, Andy Allan wrote:
>
>> On 13 January 2016 at 12:26, Magnus Hagdorn <[email protected]>
>> wrote:
>>
>>> Hi there,
>>> we recently had a problem with two OSDs failing because of I/O errors of
>>> the
>>> underlying disks. We run a small ceph cluster with 3 nodes and 18 OSDs in
>>> total. All 3 nodes are dell poweredge r515 servers with PERC H700
>>> (MegaRAID
>>> SAS 2108) RAID controllers. All disks are configured as single disk RAID
>>> 0
>>> arrays. A disk on two separate nodes started showing I/O errors reported
>>> by
>>> SMART, with one of the disks reporting pre failure SMART error. The node
>>> with the failing disk also reported XFS I/O errors. In both cases the OSD
>>> daemons kept running although ceph reported that they were slow to
>>> respond.
>>> When we started to look into this we first tried restarted the OSDs. They
>>> then failed straight away. We ended up with data loss. We are running
>>> ceph
>>> 0.80.5 on Scientific Linux 6.6 with a replication level of 2. We had
>>> hoped
>>> that loosing disks due to hardware failure would be recoverable.
>>>
>>> Is this a known issue with the RAID controllers, version of ceph?
>>>
>> If you only lost one disk (e.g. A) then ceph would shuffle things
>> around and duplicate the data from the backup copy, so that (after
>> recovery) you have two copies again. Ceph also makes sure that the
>> copies are on different nodes, in case you lose an entire node - but
>> in this case, you've lost two disks on separate nodes.
>>
>
> AFAICT, the two failures were a few days apart. The main issue is that
> ceph didn't detect the failures. It *only* warned that there were two
> slowly responding OSDs. This is precisely our worry. How come ceph didn't
> detect and mitigate the failure.
>
> I think that's because the OSDs weren't down, just slow on read/write
(hence slow response). I think ceph takes actions only on no response from
the OSD and marks it as down.
In any case, on replication level on 2 (pool size 3, 1 master and 2 copies
of it) you should still have an integral copy of your data on the host that
is still up. Setting the min_size to 1 on the pool should give you acces to
that host, although not recommanded in case a HDD fails on that node too
your data will be lost.


> Cheers
> magnus
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to