On Thu, 05 Jan 2012 13:43:49 EST erik quanstrom <[email protected]>  wrote:
> On Thu Jan  5 13:26:16 EST 2012, [email protected] wrote:
> > On Thu, 05 Jan 2012 13:01:52 EST erik quanstrom <[email protected]>  wr
> ote:
> > > > if you read 1TB, you have 8% chance of a silent bad read
> > > > sector.  More important to worry about that in today's world
> > > > than optimizing disk space use.
> > > 
> > > do you have a citation for this?  i know if you work out the
> > > numbers from the BER, this is about what you get, but in
> > > practice i do not see this 8%.  we do pattern writes all the
> > > time, and i can't recall the last time i saw a "silent" read error.
> > 
> > Silent == unseen! Do you log RAID errors? Only way to catch them.
> > 
> > That number is derived purely on an bit error rate (I think
> > vendors base that on the Reed-Solomon code used). No idea how
> > "uniformly random" the data (or medium) is in practice. I
> > thought the "practice" was worse!
> 
> i thought your definition of silent was not caught by the on-drive
> ecc.  i think this is not very likely,   and we're explicitly checking for

Hmm.... You are right!  I meant *uncorrectable* read errors
(URE), which are not necessarily *undetectable* errors (where
a data pattern switches to another pattern mapping to the same
syndrome bits).  Clearly my memory by now has had much more
massive bit-errors! Still, consumer disk URE rate of 10^-14
coupled with large disk sizes does mean RAID is essential. 

> this byrunning massive numbers of disks through pattern writes with
> verification, and don't see it.

Are these new disks?  The rate goes up with age.  Do SMART
stats show any new errors?  It is also possible vendors are
*conservatively* specifying 10^-14 (though I no longer know
how they arrive at the URE number!).  Can you share what you
did discover? [offline, if you don't want to broadcast]

You've probably read
http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.ps

Reply via email to