This message is from the T13 list server.

On 3/18/02, Jim McGrath and Hale Landis both sent long eMails explaining 
theory about how ECC works on hard disk drives.

The eMails were very well written and explain the "theory" quite well.  
However, what was not detailed, is that "theory" is not always correctly 
implemented.  That is why we need testing.  

The "theory" of ECC correction depends upon the bit error rate of the 
"Head/Media/Channel".  The predictions about the probability of 
Mis-Correction depend very much upon the raw error rate of the data being 
corrected.

Drive suppliers have been VERY reluctant to allow the raw error rate of a 
drive to be measured by their customers.  The "theory" is that the 
customers don't "need to know".   However, the manufacturing division of 
the drive companies are under constant pressure to improve yield, reduce 
test time and reduce cost.   To achieve these ends the temptation is to 
push testing back to the component suppliers and to allow looser 
tolerances on components.   Sometimes the design groups do not realize 
when the actual raw error rate has been increased.

The other problem is that there can be environmental contributions to the 
error rate.  For example, if the drive is not well shielded then EMI can 
increase error rate.  If the servo is not "stiff" enough, vibration can 
effect error rate.   Thus, even if the manufacturer "knows" the raw error 
rate, they may not realize when that rate increases the probability of 
mis-correction to an unacceptable level.

My R/W Long tests to not attempt to measure the raw error rate.   They do 
not attempt to verify scientifically the ECC algorithms.   They only 
attempt to find out if there is any margin in the probability of 
mis-correction.

These tests were not developed because of "theory".   They were developed 
because of REAL problems which were undetected by the drive manufacturers 
and resulted in REAL data corruption.   

As the media capacity increases, more and more demands are placed upon 
Error Correcting Codes.   We should ALL be working to improve quality 
since NO ONE wants data corruption.  Please do not take away what limited 
testing is available at the end user level.  We should actually be 
looking for ways to IMPROVE the "in system" testability.

...Harlan




Reply via email to