On Tue, 27 Mar 2007, Eirik Øverby wrote:
On 27. mar. 2007, at 15.33, Gavin Atkinson wrote:
On Tue, 2007-03-27 at 15:00 +0200, Eirik Øverby wrote:
Hi all,

running 6.1-RELEASE on several HP DL385 servers (identically
configured), one of them has recently spat the following out in the /
var/log/messages file:

..........
Mar 10 03:51:24 apphost02 ntpd[445]: kernel time sync enabled 2001
Mar 10 05:02:01 apphost02 kernel: NMI ISA 30, EISA ff
..........

I suspect you'll find your (ECC) memory has problems.

You are absolutely correct. Further investigation using the ProLiant management tools for FreeBSD revealed serious RAM trouble. Two banks were degraded, so we have now had the modules replaced on-site.

Glad to be of help!

Thanks for the tip!
Do you happen to know if there are any "generic" tools/daemons available to decipher such NMIs? Perhaps be able to send SNMP traps or something?

I don't, to be honest. There is some code in /usr/src/sys/i386/isa/nmi.c that tries to detect the cause of an NMI, although I don't remember ever seeing the messages when a parity error was detected. I guess it's possible that (to some chipset vendor at least) 0x20 and 0x30 indicate parity error, but neither our code or Linux's (see http://fxr.watson.org/fxr/source/arch/i386/kernel/traps.c?v=linux-2.6#L743 )
know those codes to mean parity error.

Gavin
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to