We can find the reason now. 
Please enable TSO back.
Then run "ethtool -s ethx msglvl 0x2c01". This will enable debug code that logs 
HW ring data (into dmesg log) when Tx hang occurs. When issue occur next time 
please send me the full dmesg log.

-Tushar

>-----Original Message-----
>From: Andrew Peng [mailto:peng...@gmail.com]
>Sent: Wednesday, July 18, 2012 6:24 AM
>To: e1000-devel@lists.sourceforge.net
>Subject: Re: [E1000-devel] 82571EB - Detected Hardware Unit Hang
>
>Thus far disabling TSO via ethtool has seemed to work - can anyone explain
>the technical reason why this appears to have fixed the issue?
>
>--Andrew
>
>On Mon, Jul 16, 2012 at 3:47 PM, Andrew Peng <peng...@gmail.com> wrote:
>> Sorry folks, but I just realized that I hadn't been replying to the
>> list properly and instead I was mistakenly  emailing Dave directly.
>>
>> I'm consolidating and re-sending the information to the list.
>>
>> BIOS on the HP N40L does not specify any options for AER or PCIe error
>> management, or packet size (referenced in another thread)
>>
>> I have also tried to disable PCIe power management to no success.
>>
>> I did see one options in the BIOS relating to ACPI functionality, and
>> referencing a document that Dave sent me saying the AER kernel driver
>> may not be loaded if certain ACPI modules are loaded, I will disable
>> this and check for errors. I don't have convenient physical access to
>> the server so this will take a few days.
>>
>> I am attaching the dmesg and lspci -vvv (as root) output to this
>message.
>>
>> Thanks for all the help folks.
>>
>> --Andrew
>>
>> On Wed, Jul 11, 2012 at 8:37 PM, Dave, Tushar N
><tushar.n.d...@intel.com> wrote:
>>>>-----Original Message-----
>>>>From: Andrew Peng [mailto:peng...@gmail.com]
>>>>Sent: Wednesday, July 11, 2012 8:50 AM
>>>>To: e1000-devel@lists.sourceforge.net
>>>>Subject: [E1000-devel] 82571EB - Detected Hardware Unit Hang
>>>>
>>>>Folks, I've been getting some strange error messages in my home
>>>>server / router that I've been having trouble debugging. I'm decently
>>>>proficient in Linux, but I fear I'm in over my head with this one.
>>>>
>>>>The hardware is a HP N40L Microserver - here are the hardware details
>>>>- http://n40l.wikia.com/wiki/Base_Hardware
>>>>
>>>>I am running Debian Squeeze 6.0:
>>>>pengc99@gaia:/$ sudo uname -a
>>>>Linux gaia 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64
>>>>GNU/Linux
>>>>
>>>>I also subscribe to Ksplice's Uptrack system but since I have the
>>>>newest kernel installed (as released by Debian) there have been no
>>>>hot-patches yet.
>>>>
>>>>This is the message I've been getting in /var/log/kern.log:
>>>>Jul 11 08:55:38 gaia kernel: [402056.009687] e1000e 0000:02:00.0:
>>>>eth1: Detected Hardware Unit Hang:
>>>>Jul 11 08:55:38 gaia kernel: [402056.009690]   TDH
><fc>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009692]   TDT
><fd>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009693]   next_to_use
><fd>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009694]   next_to_clean
><fc>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009695]
>buffer_info[next_to_clean]:
>>>>Jul 11 08:55:38 gaia kernel: [402056.009697]   time_stamp
>>>><105fc92b2>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009698]   next_to_watch
><fc>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009699]   jiffies
>>>><105fc93da>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009700]   next_to_watch.status <0>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009701] MAC Status <80383>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009702] PHY Status
><792d>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009703] PHY 1000BASE-T Status
><3800>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009705] PHY Extended Status
><3000>
>>>>Jul 11 08:55:38 gaia kernel: [402056.009706] PCI Status
><10>
>>>>
>>>>Complete output of lspci:
>>>>pengc99@gaia:/$ lspci
>>>>00:00.0 Host bridge: Advanced Micro Devices [AMD] RS880 Host Bridge
>>>>00:01.0 PCI bridge: Hewlett-Packard Company Device 9602
>>>>00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
>>>>bridge (ext gfx port 0)
>>>>00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
>>>>bridge (PCIE port 2)
>>>>00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
>>>>Controller [AHCI mode] (rev 40)
>>>>00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
>>>>Controller
>>>>00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
>>>>Controller
>>>>00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
>>>>Controller
>>>>00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
>>>>Controller
>>>>00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
>>>>00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host
>>>>controller (rev 40)
>>>>00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev
>>>>40)
>>>>00:16.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
>>>>Controller
>>>>00:16.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
>>>>Controller
>>>>00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h
>>>>Processor HyperTransport Configuration
>>>>00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h
>>>>Processor Address Map
>>>>00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h
>>>>Processor DRAM Controller
>>>>00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h
>>>>Processor Miscellaneous Control
>>>>00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h
>>>>Processor Link Control
>>>>01:05.0 VGA compatible controller: ATI Technologies Inc M880G
>>>>[Mobility Radeon HD 4200]
>>>>02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
>>>>Ethernet Controller (rev 06)
>>>>02:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit
>>>>Ethernet Controller (rev 06)
>>>>03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5723
>>>>Gigabit Ethernet PCIe (rev 10)
>>>>
>>>>Output of lspci -vvv (as root, network adapter section):
>>>>02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
>>>>Ethernet Controller (rev 06)
>>>>        Subsystem: Hewlett-Packard Company NC360T PCI Express Dual
>>>>Port Gigabit Server Adapter
>>>>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>ParErr- Stepping- SERR+ FastB2B- DisINTx+
>>>>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>>>TAbort-
>>>><TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>        Latency: 0, Cache Line Size: 64 bytes
>>>>        Interrupt: pin A routed to IRQ 26
>>>>        Region 0: Memory at fe8e0000 (32-bit, non-prefetchable)
>>>>[size=128K]
>>>>        Region 1: Memory at fe8c0000 (32-bit, non-prefetchable)
>>>>[size=128K]
>>>>        Region 2: I/O ports at e800 [size=32]
>>>>        Expansion ROM at fe8a0000 [disabled] [size=128K]
>>>>        Capabilities: [c8] Power Management version 2
>>>>                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>>>PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>>                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>>>>        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>>                Address: 00000000fee0300c  Data: 4191
>>>>        Capabilities: [e0] Express (v1) Endpoint, MSI 00
>>>>                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency
>>>>L0s <512ns, L1 <64us
>>>>                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>>>>                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
>>>>Unsupported+
>>>>                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>>>                        MaxPayload 128 bytes, MaxReadReq 512 bytes
>>>>                DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+
>>>>AuxPwr+ TransPend-
>>>>                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s,
>>>>Latency L0 <4us, L1 <64us
>>>>                        ClockPM- Surprise- LLActRep- BwNot-
>>>>                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled-
>>>>Retrain-
>>>>CommClk+
>>>>                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train-
>>>>SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>        Capabilities: [100 v1] Advanced Error Reporting
>>>>                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
>>>>UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>>>>                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
>>>>UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
>>>>UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>                CESta:  RxErr+ BadTLP+ BadDLLP- Rollover- Timeout-
>>>>NonFatalErr-
>>>>                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>>>>NonFatalErr-
>>>>                AERCap: First Error Pointer: 14, GenCap- CGenEn-
>>>>ChkCap-
>>>>ChkEn-
>>>>        Capabilities: [140 v1] Device Serial Number
>>>>00-1f-29-ff-ff-5b-38-
>>>>56
>>>>        Kernel driver in use: e1000e
>>>>
>>>>The only references I could find online about this problem is about
>>>>some PCI-E power management EEPROM bug:
>>>>http://serverfault.com/questions/193114/linux-e1000e-intel-networking
>>>>- driver-problems-galore-where-do-i-start
>>>>http://downloadmirror.intel.com/9180/eng/README.txt
>>>>
>>>>Along with the associated fix script:
>>>>http://sourceforge.net/projects/e1000/files/e1000e%20stable/eeprom_fi
>>>>x_825
>>>>74_or_82583/
>>>>
>>>>However, this appears only to apply to 82574 or 82583 chipsets. This
>>>>is a 82571EB. I also checked the EEPROM output and it doesn't look
>>>>like the fix
>>>>applies:
>>>>
>>>>pengc99@gaia:/var$ sudo ethtool -e eth1 | head [sudo] password for
>>>>pengc99:
>>>>Offset          Values
>>>>------          ------
>>>>0x0000          00 1f 29 5b 38 56 30 15 ff ff b2 50 ff ff ff ff
>>>>0x0010          19 d5 04 30 2f a4 44 70 3c 10 5e 10 86 80 65 b1
>>>>
>>>>
>>>>I'd appreciate any help I can get, and thanks for all the hard work!
>>>>
>>>>--Andrew Peng
>>>
>>> Looks like there PCIe errors detected
>>> DevSta: CorrErr+ **UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
>>>
>>>
>>> Would you please attach full dmesg log and full lspci -vvv (run as
>root) after issue occurs.
>>> Please also attach your kernel .config.
>>> Does this issue happen after you upgrade kernel?
>>>
>>> Few things to try,
>>> please load AER module and see if it logs any errors into log.
>>> Does BIOS log reports any machine check errors?
>>> Try disable TSO.
>>>
>>> -tushar
>>>
>>>
>>>
>
>--------------------------------------------------------------------------
>----
>Live Security Virtual Conference
>Exclusive live event will cover all the ways today's security and threat
>landscape has changed and how IT managers can respond. Discussions will
>include endpoint security, mobile security and the latest in malware
>threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>_______________________________________________
>E1000-devel mailing list
>E1000-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/e1000-devel
>To learn more about Intel&#174; Ethernet, visit
>http://communities.intel.com/community/wired

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to