This message is from the T13 list server.
On 3/18/02, Jim McGrath and Hale Landis both sent long eMails explaining theory about how ECC works on hard disk drives. The eMails were very well written and explain the "theory" quite well. However, what was not detailed, is that "theory" is not always correctly implemented. That is why we need testing. The "theory" of ECC correction depends upon the bit error rate of the "Head/Media/Channel". The predictions about the probability of Mis-Correction depend very much upon the raw error rate of the data being corrected. Drive suppliers have been VERY reluctant to allow the raw error rate of a drive to be measured by their customers. The "theory" is that the customers don't "need to know". However, the manufacturing division of the drive companies are under constant pressure to improve yield, reduce test time and reduce cost. To achieve these ends the temptation is to push testing back to the component suppliers and to allow looser tolerances on components. Sometimes the design groups do not realize when the actual raw error rate has been increased. The other problem is that there can be environmental contributions to the error rate. For example, if the drive is not well shielded then EMI can increase error rate. If the servo is not "stiff" enough, vibration can effect error rate. Thus, even if the manufacturer "knows" the raw error rate, they may not realize when that rate increases the probability of mis-correction to an unacceptable level. My R/W Long tests to not attempt to measure the raw error rate. They do not attempt to verify scientifically the ECC algorithms. They only attempt to find out if there is any margin in the probability of mis-correction. These tests were not developed because of "theory". They were developed because of REAL problems which were undetected by the drive manufacturers and resulted in REAL data corruption. As the media capacity increases, more and more demands are placed upon Error Correcting Codes. We should ALL be working to improve quality since NO ONE wants data corruption. Please do not take away what limited testing is available at the end user level. We should actually be looking for ways to IMPROVE the "in system" testability. ...Harlan
