Fred,
I appreciate the explanation. So with out a 1,000, 10,000, or even 100,000 drives there is no way to know how long my drives in the RAID will last. All I know for sure is that I can lose anyone drive and the RAID can be rebuilt.
GOD Bless and Thanks,
rich!

On 3/28/2018 4:43 PM, Fred Cisin via cctalk wrote:
On Wed, 28 Mar 2018, Richard Pope via cctalk wrote:
I have been kind of following this thread. I have a question about MTBF. I have four HGST UltraStar Enterprise 2TB drives setup in a Hardware RAID 10 configuration. If the the MTBF is 100,000 Hrs for each drive does this mean that the total MTBF is 25,000 Hrs?

<pedantic sadistics>
Probably NOT.
It depends extremely heavily on the shape of the curve of failure times.
MEAN Time Before Failure, of course, means that for a large enough sample, half the drives fail before 100,000 hours, and half after. Thus, at 100,000 hours, half are dead.

But, how evenly distributed are the failures?
Besides the MTBF, it would help to know the variance or standard deviation. It is unlikely that the failures follow a "normal distribution" (or "Laplace-Gauss") bell curve. And, other distributions are certainly not ABnormal :-)

If the curve is symmetrical, then the mean, median, and mode will all be the same. If it is not symmetrical, then they won't be. Hence the use of MEDIAN - at that point half are dead, half are still alive. In toxicology, there is a concept of an LD-50 dosage - the dosage that will kill half, since for example, antibiotic resistant bacteria might require an incredibly large dosage to get that last one, but LD-50 provides a convenient way to get a single number.
100,000 hours is the LD-50 of those drives.


If it turns out that the drives last 100,000 hours, plus or minus 10%, then you have a curve with a very steep slope. It is still half dead at 100,000, but maybe hardly any dead until 90,000, hardly any left alive at 110,000.

OTOH, if the failures were evenly distributed throughout a life of 0 to 200,000 hours, with the same number going every day, then that also would have a MTBF of 100,000. In THAT case, then yes, the MTBF of first failure may well be 25,000.


They rarely work that way. Often our devices will have what is sometimes called a "bathtub curve". There are a few failures IMMEDIATELY ("infant mortality") falling off rapidly, and then few failures for quite a while, and then, as random parts start to wear out, the failures rise. In fact, with the same MTBF of 100,000, it could be that once the early demise ones are discarded, that the MTBF of the REMAINDER might be 200,000.

IFF you are willing to deal with the DOA and infant mortality cases, then by discarding or ignoring those outlying numbers, you might get a more realistic evaluation of what to expect.
</pedantic sadistics>

--
Grumpy Ol' Fred             ci...@xenosoft.com


Reply via email to