Re: [9fans] Recovering a venti from disk failure

W B Hacker Thu, 19 Apr 2007 14:39:46 -0700

erik quanstrom wrote:

Various studies seem to indicate failure rates are highly
correlated with drive model, vintage and manufacturer.
Assuming a RAID is built from similar disks, when one fails
the others are more likely to fail.


while it is true that some disks vintages are better than others, when
one drive fails, the probability of the other drives failing has not
changed.  this is the same as if you flip a coin ten times and get ten
heads, the probability of flipping the same coin and getting heads, is
still 1/2.

i think this corelation gives people the false impression that they do
fail en masse, but that's really wrong.  the latent errors probablly
happened months ago.

Yes but if there are many latent errors and/or the error rate
is going up it is time to replace it.


maybe.  the goggle paper you cited didn't find a strong correlation
between smart errors (including block relocation) and failure.

This is a good idea.  We did this in 1983, back when disks
were simpler beasts.  No RAID then of course.


even a better idea back then.  disks didn't have 1/4 million
lines of firmware relocating blocks and doing other things to^w
i mean for you.

- erik

And - lest we forget - a RAID array actually has a higher statistical chance offailure, and a *lower* MTBF than a single drive. Simple math.

What we gain is a reduced risk of *unrecoverable* damage, not fewer failures,per se.


Bill

Re: [9fans] Recovering a venti from disk failure

Reply via email to