> On Mar 28, 2018, at 2:32 PM, Fred Cisin via cctalk <cctalk@classiccmp.org> 
> wrote:
> 
>>> How many drives would you need, to be able to set up a RAID, or hot 
>>> swappable RAUD (Redundant Array of Unreliable Drives), that could give 
>>> decent reliability with such drives?
>>> How many to be able to not have data loss if a second one dies before the 
>>> first casualty is replaced?
>>> How many to be able to avoid data loss if a third one dies before the first 
>>> two are replaced?
> 
> On Wed, 28 Mar 2018, Paul Koning wrote:
> ...
>> The basic assumption is that failures are "fail stop", i.e., a drive refuses 
>> to deliver data.  (In particular, it doesn't lie -- deliver wrong data.  You 
>> can build systems that deal with lying drives but RAID is not such a 
>> system.)  The failure may be the whole drive ("it's a door-stop") or 
>> individual blocks (hard read errors).
> 
> So, in addition to the "RAID" configuration, you would also need additional 
> redundancy to compare multiple reads for error detection.
> At the simplest level, if the reads don't match, then there is an error.
> If a retry produces different dataa, then that drive has an error.
> If two drives agree against a third, then there is a high probability that 
> the variant drive is in error.

If you don't trust drives to deliver correct data often enough, you need your 
own error detection.  Comparing redundant copies is possible.  More efficient 
is various EDC or ECC codes.  Some file systems used hashes like SHA-1 to 
detect data corruption with extremely high probability.

> ...
>> So one way to look at it: given the MTBF, calculate the probability of two 
>> drives failing within N hours (where N is the time required to replace the 
>> failed drive and then rebuild the data onto the new drive). But that is not 
>> the whole story.
> 
> 'course not.  Besides MTBF for calculating the probability of a second drive 
> failing within N hours, must also consider other factors, such as external 
> influences causing more than one drive to go, and the essentially non-linear 
> aspect of a failure rate curve.

Yes, RAID has an underlying assumption that drive failures are independent 
random events.  If that isn't valid then you have a big problem.  This 
occasionally happens; there have been drive enclosures with inadequate 
mechanical design, resulting in excessive vibration which caused rapid and 
correlated drive failure.  The answer to that is "test it properly and don't 
ship stuff like that".

>> The other part of the story is that drives have a non-zero probability of a 
>> hard read error.  So during rebuild, you may encounter a sector on one of 
>> the remaining drives that can't be read.  If so, that sector is lost.
> 
> If we consider that to be a "drive failure", then we are back to designing 
> around multiple failures.

Correct, and that is why RAID-6 is prevalent now that drives are large enough 
that there is a nontrivial error of getting a sector read error if you read the 
whole drive (as RAID-1 rebuild does) and especially if you read multiple whole 
drives (as in RAID-5).

> 
>> The probability of hard read error varies with drive technology.  And of 
>> course, the larger the drive, the greater the probability (all else being 
>> equal) of having SOME sector be unreadable.  For drives small enough to have 
>> PATA interfaces, the probability of hard read error is probably low enough 
>> that you can *usually* read the whole drive without error.  That translates 
>> to: RAID-1 and RAID-5 are generally adequate for PATA disks.
> 
> "generally".
> The original thought behind this silly suggestion was whether it would be 
> possible to make use of MANY very unreliable drives.

Definitely.  You'd have to analyze the model just as I described.  If things 
are bad enough, you may find that RAID-6 is inadequate and you instead need a 
N-fault redundant code with N>2.  Such things are mathematically 
straightforward but compute intensive.  I've seen this done in the "Self-star" 
distributed storage research system at Carnegie-Mellon about a decade ago.  
Partly the reason was to deal with cheap unreliable devices, and partly was as 
an intellectual exercise "because we can".

        paul

Reply via email to