disk failure prediction

Sage Weil Wed, 18 Feb 2015 15:20:57 -0800

Interesting paper at FAST:

        
https://www.usenix.org/system/files/conference/fast15/fast15-paper-ma.pdf


Short version: reallocated sectors correllates with impending disk 
failures (this sounds like what Sandon has been telling us for ages) and 
by preemptively replacing disks with impending failures reduced EMC's rate 
of triple-failures by 80%, and looking at the joint failure probability 
within each raid set reduces the failure rate by 98%.  We wouldn't see 
quite the same results since our "raid sets" are effectively entire pools, 
but this seems like a strong case for adding smart monitoring to the osds 
or to calamari already and doing some preemptive disk replacement.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

disk failure prediction

Reply via email to