2016-01-14 11:25 GMT+02:00 Magnus Hagdorn <[email protected]>: > On 13/01/16 13:32, Andy Allan wrote: > >> On 13 January 2016 at 12:26, Magnus Hagdorn <[email protected]> >> wrote: >> >>> Hi there, >>> we recently had a problem with two OSDs failing because of I/O errors of >>> the >>> underlying disks. We run a small ceph cluster with 3 nodes and 18 OSDs in >>> total. All 3 nodes are dell poweredge r515 servers with PERC H700 >>> (MegaRAID >>> SAS 2108) RAID controllers. All disks are configured as single disk RAID >>> 0 >>> arrays. A disk on two separate nodes started showing I/O errors reported >>> by >>> SMART, with one of the disks reporting pre failure SMART error. The node >>> with the failing disk also reported XFS I/O errors. In both cases the OSD >>> daemons kept running although ceph reported that they were slow to >>> respond. >>> When we started to look into this we first tried restarted the OSDs. They >>> then failed straight away. We ended up with data loss. We are running >>> ceph >>> 0.80.5 on Scientific Linux 6.6 with a replication level of 2. We had >>> hoped >>> that loosing disks due to hardware failure would be recoverable. >>> >>> Is this a known issue with the RAID controllers, version of ceph? >>> >> If you only lost one disk (e.g. A) then ceph would shuffle things >> around and duplicate the data from the backup copy, so that (after >> recovery) you have two copies again. Ceph also makes sure that the >> copies are on different nodes, in case you lose an entire node - but >> in this case, you've lost two disks on separate nodes. >> > > AFAICT, the two failures were a few days apart. The main issue is that > ceph didn't detect the failures. It *only* warned that there were two > slowly responding OSDs. This is precisely our worry. How come ceph didn't > detect and mitigate the failure. > > I think that's because the OSDs weren't down, just slow on read/write (hence slow response). I think ceph takes actions only on no response from the OSD and marks it as down. In any case, on replication level on 2 (pool size 3, 1 master and 2 copies of it) you should still have an integral copy of your data on the host that is still up. Setting the min_size to 1 on the pool should give you acces to that host, although not recommanded in case a HDD fails on that node too your data will be lost.
> Cheers > magnus > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
