Re: Dedicated disk for operating system

Brian Bockelman Wed, 10 Aug 2011 12:32:36 -0700

MTTF is a difficult number.  Popular papers include: 
http://db.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html, 
http://labs.google.com/papers/disk_failures.pdf


Ted is assuming a MTTF of 25kHours; I think that's overly pessimistic, although 
both papers indicate that MTTF is a crappy way to model disk lifetime.

I think a lot has to do with the quality of the batch of hard drives you get 
and operating conditions.

Brain

On Aug 10, 2011, at 2:19 PM, Luke Lu wrote:

> On Wed, Aug 10, 2011 at 10:40 AM, Ted Dunning <[email protected]> wrote:
>> To be specific, taking a 100 node x 10 disk x 2 TB configuration with drive
>> MTBF of 1000 days, we should be seeing drive failures on average once per
>> day....
>> For a 10,000 node cluster, however, we should expect the average rate of
>> disk failure rate of one failure every 2.5 hours.
> 
> Do you have real data to back the analysis? You assume a uniform disk
> failure distribution, which is absolutely not true. I can only say
> that our ops data across 40000+ nodes shows that the above analysis is
> not even close. (This is assuming that the ops know what they are
> doing though :)
> 
> __Luke

smime.p7s
Description: S/MIME cryptographic signature

Re: Dedicated disk for operating system

Reply via email to