Re: [9fans] Recovering a venti from disk failure

erik quanstrom Thu, 19 Apr 2007 14:31:41 -0700

> Various studies seem to indicate failure rates are highly
> correlated with drive model, vintage and manufacturer.
> Assuming a RAID is built from similar disks, when one fails
> the others are more likely to fail.


while it is true that some disks vintages are better than others, when
one drive fails, the probability of the other drives failing has not
changed.  this is the same as if you flip a coin ten times and get ten
heads, the probability of flipping the same coin and getting heads, is
still 1/2.

>> i think this corelation gives people the false impression that they do
>> fail en masse, but that's really wrong.  the latent errors probablly
>> happened months ago.
> 
> Yes but if there are many latent errors and/or the error rate
> is going up it is time to replace it.

maybe.  the goggle paper you cited didn't find a strong correlation
between smart errors (including block relocation) and failure.

> This is a good idea.  We did this in 1983, back when disks
> were simpler beasts.  No RAID then of course.

even a better idea back then.  disks didn't have 1/4 million
lines of firmware relocating blocks and doing other things to^w
i mean for you.

- erik

Re: [9fans] Recovering a venti from disk failure

Reply via email to