Gus Wirth wrote:
You also have to be careful when evaluating reliability. There are a lot of things that go into determining reliability figures.
Speaking only of the reliablity of the hard drives themselves the Google and CMU papers that came out this year about hard drive reliability were very interesting.
Hard drive failures are only a small portion of the overall figure for reliability for a computer. In particular, even if your drives are mirrored you are only protected against a small subset of single drive failures where the failure mode is non-catastrophic to other parts of the system. If a drive fails by shorting the +12 to the +5 volt supply (this is highly unlikely) it could take just about everything else with it rendering the mirror useless.
I have had many drives die but never had a short across the bus like that. As you said, highly unlikely. And that sort of thing is what I have backup copies for. Bacula to the rescue!
One of the biggest problems in determining reliability of anything is getting a history of the item. As I've mentioned before, all the MTBF (Mean Time Before Failure) you see on computer hardware are just guesses because the hardware product cycle is somewhere near 9 months, much shorter than the expected MTBF. So there isn't enough data collected to verify the estimates, and there is no economic incentive to do anything about it.
The Google and CMU papers were the best research done so far. Brief summary of the two: SCSI isn't any more reliable than IDE. Temperature doesn't affect drives as much as we thought. Chances of a multiple drive failure in a RAID 5 are higher than we think. They wouldn't tell us which vendors were more reliable than others.
-- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list
