Folks, I've been getting some strange error messages in my home server / router that I've been having trouble debugging. I'm decently proficient in Linux, but I fear I'm in over my head with this one.
The hardware is a HP N40L Microserver - here are the hardware details - http://n40l.wikia.com/wiki/Base_Hardware I am running Debian Squeeze 6.0: pengc99@gaia:/$ sudo uname -a Linux gaia 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux I also subscribe to Ksplice's Uptrack system but since I have the newest kernel installed (as released by Debian) there have been no hot-patches yet. This is the message I've been getting in /var/log/kern.log: Jul 11 08:55:38 gaia kernel: [402056.009687] e1000e 0000:02:00.0: eth1: Detected Hardware Unit Hang: Jul 11 08:55:38 gaia kernel: [402056.009690] TDH <fc> Jul 11 08:55:38 gaia kernel: [402056.009692] TDT <fd> Jul 11 08:55:38 gaia kernel: [402056.009693] next_to_use <fd> Jul 11 08:55:38 gaia kernel: [402056.009694] next_to_clean <fc> Jul 11 08:55:38 gaia kernel: [402056.009695] buffer_info[next_to_clean]: Jul 11 08:55:38 gaia kernel: [402056.009697] time_stamp <105fc92b2> Jul 11 08:55:38 gaia kernel: [402056.009698] next_to_watch <fc> Jul 11 08:55:38 gaia kernel: [402056.009699] jiffies <105fc93da> Jul 11 08:55:38 gaia kernel: [402056.009700] next_to_watch.status <0> Jul 11 08:55:38 gaia kernel: [402056.009701] MAC Status <80383> Jul 11 08:55:38 gaia kernel: [402056.009702] PHY Status <792d> Jul 11 08:55:38 gaia kernel: [402056.009703] PHY 1000BASE-T Status <3800> Jul 11 08:55:38 gaia kernel: [402056.009705] PHY Extended Status <3000> Jul 11 08:55:38 gaia kernel: [402056.009706] PCI Status <10> Complete output of lspci: pengc99@gaia:/$ lspci 00:00.0 Host bridge: Advanced Micro Devices [AMD] RS880 Host Bridge 00:01.0 PCI bridge: Hewlett-Packard Company Device 9602 00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 0) 00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 2) 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (rev 40) 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42) 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller (rev 40) 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40) 00:16.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:16.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 01:05.0 VGA compatible controller: ATI Technologies Inc M880G [Mobility Radeon HD 4200] 02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 02:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5723 Gigabit Ethernet PCIe (rev 10) Output of lspci -vvv (as root, network adapter section): 02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) Subsystem: Hewlett-Packard Company NC360T PCI Express Dual Port Gigabit Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 26 Region 0: Memory at fe8e0000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at fe8c0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at e800 [size=32] Expansion ROM at fe8a0000 [disabled] [size=128K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0300c Data: 4191 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <4us, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number 00-1f-29-ff-ff-5b-38-56 Kernel driver in use: e1000e The only references I could find online about this problem is about some PCI-E power management EEPROM bug: http://serverfault.com/questions/193114/linux-e1000e-intel-networking-driver-problems-galore-where-do-i-start http://downloadmirror.intel.com/9180/eng/README.txt Along with the associated fix script: http://sourceforge.net/projects/e1000/files/e1000e%20stable/eeprom_fix_82574_or_82583/ However, this appears only to apply to 82574 or 82583 chipsets. This is a 82571EB. I also checked the EEPROM output and it doesn't look like the fix applies: pengc99@gaia:/var$ sudo ethtool -e eth1 | head [sudo] password for pengc99: Offset Values ------ ------ 0x0000 00 1f 29 5b 38 56 30 15 ff ff b2 50 ff ff ff ff 0x0010 19 d5 04 30 2f a4 44 70 3c 10 5e 10 86 80 65 b1 I'd appreciate any help I can get, and thanks for all the hard work! --Andrew Peng ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired