On Wed, Aug 10, 2011 at 10:40 AM, Ted Dunning <[email protected]> wrote:
> To be specific, taking a 100 node x 10 disk x 2 TB configuration with drive
> MTBF of 1000 days, we should be seeing drive failures on average once per
> day....
> For a 10,000 node cluster, however, we should expect the average rate of
> disk failure rate of one failure every 2.5 hours.

Do you have real data to back the analysis? You assume a uniform disk
failure distribution, which is absolutely not true. I can only say
that our ops data across 40000+ nodes shows that the above analysis is
not even close. (This is assuming that the ops know what they are
doing though :)

__Luke

Reply via email to