On Thu, 05 Jan 2012 13:43:49 EST erik quanstrom <[email protected]> wrote: > On Thu Jan 5 13:26:16 EST 2012, [email protected] wrote: > > On Thu, 05 Jan 2012 13:01:52 EST erik quanstrom <[email protected]> wr > ote: > > > > if you read 1TB, you have 8% chance of a silent bad read > > > > sector. More important to worry about that in today's world > > > > than optimizing disk space use. > > > > > > do you have a citation for this? i know if you work out the > > > numbers from the BER, this is about what you get, but in > > > practice i do not see this 8%. we do pattern writes all the > > > time, and i can't recall the last time i saw a "silent" read error. > > > > Silent == unseen! Do you log RAID errors? Only way to catch them. > > > > That number is derived purely on an bit error rate (I think > > vendors base that on the Reed-Solomon code used). No idea how > > "uniformly random" the data (or medium) is in practice. I > > thought the "practice" was worse! > > i thought your definition of silent was not caught by the on-drive > ecc. i think this is not very likely, and we're explicitly checking for
Hmm.... You are right! I meant *uncorrectable* read errors (URE), which are not necessarily *undetectable* errors (where a data pattern switches to another pattern mapping to the same syndrome bits). Clearly my memory by now has had much more massive bit-errors! Still, consumer disk URE rate of 10^-14 coupled with large disk sizes does mean RAID is essential. > this byrunning massive numbers of disks through pattern writes with > verification, and don't see it. Are these new disks? The rate goes up with age. Do SMART stats show any new errors? It is also possible vendors are *conservatively* specifying 10^-14 (though I no longer know how they arrive at the URE number!). Can you share what you did discover? [offline, if you don't want to broadcast] You've probably read http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.ps
