On Wed, Aug 10, 2011 at 10:40 AM, Ted Dunning <[email protected]> wrote: > To be specific, taking a 100 node x 10 disk x 2 TB configuration with drive > MTBF of 1000 days, we should be seeing drive failures on average once per > day.... > For a 10,000 node cluster, however, we should expect the average rate of > disk failure rate of one failure every 2.5 hours.
Do you have real data to back the analysis? You assume a uniform disk failure distribution, which is absolutely not true. I can only say that our ops data across 40000+ nodes shows that the above analysis is not even close. (This is assuming that the ops know what they are doing though :) __Luke
