Hi,

A couple of weeks ago I installed FreeBSD 8.2RC1 on a new machine (8.1 was 
having issues with the raid card, since 8.2 is nearly final I figured... why 
not). The machine has been running smoothly for a while, even while load-
testing the harddrives and network for more than 24 hours.

Since everything was running smoothly I decided to move one of the production 
PostgreSQL databases to this machine. However... after a couple of hours I got 
the following error from the iDRAC:
PCIE Fatal Err: Critical Event sensor, bus fatal error (Slot 3) was asserted

Followed by a lot of garbled text in the console (see the full log in the 
attachment) and immediately this message:
Jan 20 21:09:25 sh4 kernel: NMI ISA 30, EISA ff
Jan 20 21:09:25 sh4 kernel: NMI ... going to debugger
Jan 20 21:09:25 sh4 kernel: NMI ISA 30, EISA ff
Jan 20 21:09:25 sh4 kernel: NMI ... going to debugger
Jan 20 21:09:25 sh4 kernel: NMI ISA N2M0I, I ESIASNA  Mff2

Followed by this:
Jan 20 21:09:38 sh4 kernel: igb0: Watchdog timeout -- resetting
Jan 20 21:09:38 sh4 kernel: igb0: Queue(0) tdh = 944, hw tdt = 945
Jan 20 21:09:38 sh4 kernel: igb0: TX(0) desc avail = 1023,Next TX to Clean = 
944
Jan 20 21:09:38 sh4 kernel: igb0: link state changed to DOWN
Jan 20 21:09:41 sh4 kernel: igb0: link state changed to UP

After which the lagg0 interface (which is using igb0 and igb1 as an lacp 
trunk) marks the igb0 interface as down. After a while the second interface 
got the same issue which caused the lagg0 interface to become non-functional 
and the server unreachable.


This error looks quite a bit like the one talked about in this thread: 
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=81462+0+/usr/local/www/db/text/2010/freebsd-
net/20100801.freebsd-net
But the given solution there (disabling polling) won't help because I don't 
even have device polling enabled in the kernel.


For the record, the machine regularly shows small amounts of garbled text even 
outside of these network interface crashes as can be seen in the "garbled.log" 
file. The real crash starts at 21:09:24 according to the iDRAC log.

My kernel config is mainly stock, some modules disabled.
DEVICE_POLLING is not enabled.
The garbled text should be caused by the print buffer since I do have 
PRINTF_BUFR_SIZE=128 in the config.


Thanks in advance for any help.

~rick

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to