On Tue, 27 Jun 2006, M.Hirsch wrote:
If you're using hardware w/o ECC, it just can't tell whether error present
or absent. So ECC _is_ the way to detect (not mask) broken hardware.

Ok, thanks. I think I understand the meaning of ECC now.
So, unlike my supplier claims, ECC is not supposed to help against hardware failures.
But it is the way to detect them, right?

 ECC stands for Error Checking and Correction. It's a hardware feature,
and its primary task is Checking (that is, detection) of errors. It just happens that number of additional bits which carry checking code is sufficient to correct _any_ _single-bit_ data error (not mask it, but really correct), and to detect any double-bit and most of several-bit errors (w/o correction).

Intel's ECC-capable chipset allows it. But if we're speaking about
production environment, such behaviour (abnormal termination on _corrected_
error) is unacceptable.

"abnormal termination" is not only acceptable for me, it is what I am looking for. Make the node crash completely, so one of the others can take over its task(s).

Again, when single-bit correction has happened, it's not fake, the result is actually correct. Why panic the machine immediately if all data OK?

Sincerely, Dmitry
--
Atlantis ISP, System Administrator
e-mail:  [EMAIL PROTECTED]
nic-hdl: LYNX-RIPE
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to