Folks, I've been getting some strange error messages in my home server
/ router that I've been having trouble debugging. I'm decently
proficient in Linux, but I fear I'm in over my head with this one.

The hardware is a HP N40L Microserver - here are the hardware details
- http://n40l.wikia.com/wiki/Base_Hardware

I am running Debian Squeeze 6.0:
pengc99@gaia:/$ sudo uname -a
Linux gaia 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux

I also subscribe to Ksplice's Uptrack system but since I have the
newest kernel installed (as released by Debian) there have been no
hot-patches yet.

This is the message I've been getting in /var/log/kern.log:
Jul 11 08:55:38 gaia kernel: [402056.009687] e1000e 0000:02:00.0:
eth1: Detected Hardware Unit Hang:
Jul 11 08:55:38 gaia kernel: [402056.009690]   TDH                  <fc>
Jul 11 08:55:38 gaia kernel: [402056.009692]   TDT                  <fd>
Jul 11 08:55:38 gaia kernel: [402056.009693]   next_to_use          <fd>
Jul 11 08:55:38 gaia kernel: [402056.009694]   next_to_clean        <fc>
Jul 11 08:55:38 gaia kernel: [402056.009695] buffer_info[next_to_clean]:
Jul 11 08:55:38 gaia kernel: [402056.009697]   time_stamp           <105fc92b2>
Jul 11 08:55:38 gaia kernel: [402056.009698]   next_to_watch        <fc>
Jul 11 08:55:38 gaia kernel: [402056.009699]   jiffies              <105fc93da>
Jul 11 08:55:38 gaia kernel: [402056.009700]   next_to_watch.status <0>
Jul 11 08:55:38 gaia kernel: [402056.009701] MAC Status             <80383>
Jul 11 08:55:38 gaia kernel: [402056.009702] PHY Status             <792d>
Jul 11 08:55:38 gaia kernel: [402056.009703] PHY 1000BASE-T Status  <3800>
Jul 11 08:55:38 gaia kernel: [402056.009705] PHY Extended Status    <3000>
Jul 11 08:55:38 gaia kernel: [402056.009706] PCI Status             <10>

Complete output of lspci:
pengc99@gaia:/$ lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS880 Host Bridge
00:01.0 PCI bridge: Hewlett-Packard Company Device 9602
00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
bridge (ext gfx port 0)
00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
bridge (PCIE port 2)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [AHCI mode] (rev 40)
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host
controller (rev 40)
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40)
00:16.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:16.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc M880G
[Mobility Radeon HD 4200]
02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
02:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5723
Gigabit Ethernet PCIe (rev 10)

Output of lspci -vvv (as root, network adapter section):
02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
        Subsystem: Hewlett-Packard Company NC360T PCI Express Dual
Port Gigabit Server Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 26
        Region 0: Memory at fe8e0000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at fe8c0000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at e800 [size=32]
        Expansion ROM at fe8a0000 [disabled] [size=128K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0300c  Data: 4191
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
<512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+
AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s,
Latency L0 <4us, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number 00-1f-29-ff-ff-5b-38-56
        Kernel driver in use: e1000e

The only references I could find online about this problem is about
some PCI-E power management EEPROM bug:
http://serverfault.com/questions/193114/linux-e1000e-intel-networking-driver-problems-galore-where-do-i-start
http://downloadmirror.intel.com/9180/eng/README.txt

Along with the associated fix script:
http://sourceforge.net/projects/e1000/files/e1000e%20stable/eeprom_fix_82574_or_82583/

However, this appears only to apply to 82574 or 82583 chipsets. This
is a 82571EB. I also checked the EEPROM output and it doesn't look
like the fix applies:

pengc99@gaia:/var$ sudo ethtool -e eth1 | head
[sudo] password for pengc99:
Offset          Values
------          ------
0x0000          00 1f 29 5b 38 56 30 15 ff ff b2 50 ff ff ff ff
0x0010          19 d5 04 30 2f a4 44 70 3c 10 5e 10 86 80 65 b1


I'd appreciate any help I can get, and thanks for all the hard work!

--Andrew Peng

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to