Hello all,

as I already reported exactly one year ago I am still experiencing Tx Unit 
hangs on two 82573 NICs on a TYAN Toledo i3210W S5211 (4GB RAM) running Gentoo 
64 Bit.

The box is running since May 2008 and has shown continuous Tx Unit hangs since 
then. The vanilla kernel was updated quarterly from 2.6.29.x upon install up 
to 2.6.36.1 currently.

I also replaced the unmanaged switch with a managed one but without any 
improvement.

Box was freshly rebootet.
Sometimes eth0 hangs, sometimes eth1 does.


dmesg excerpt:

e1000e 0000:0f:00.0: eth0: Detected Hardware Unit Hang:
  TDH                  <1>
  TDT                  <6f>
  next_to_use          <6f>
  next_to_clean        <ff>
buffer_info[next_to_clean]:
  time_stamp           <10001ae38>
  next_to_watch        <1>
  jiffies              <10001b747>
  next_to_watch.status <0>
MAC Status             <80080793>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:0f:00.0: eth0: Detected Hardware Unit Hang:
  TDH                  <1>
  TDT                  <6f>
  next_to_use          <6f>
  next_to_clean        <ff>
buffer_info[next_to_clean]:
  time_stamp           <10001ae38>
  next_to_watch        <1>
  jiffies              <10001bb2f>
  next_to_watch.status <0>
MAC Status             <80080793>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0x222/0x230()
Hardware name: empty
NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Modules linked in: md5 xts gf128mul tun nfsd lockd nfs_acl auth_rpcgss sunrpc 
af_packet bonding ipv6 coretemp lm85 hwmon_vid hwmon ext4 mbcache jbd2 crc16 
ansi_cprng krng eseqiv rng aes_x86_64 aes_generic cbc cryptomgr aead dm_crypt 
cryptd crypto_hash crypto_wq crypto_blkcipher crypto_algapi fan 
cpufreq_ondemand acpi_cpufreq freq_table mperf msr cpuid usbhid hid 8250_pnp 
ehci_hcd uhci_hcd e1000e usbcore 8250 rtc_cmos serial_core rtc_core i2c_i801 
nls_base rtc_lib i2c_core psmouse evdev pcspkr button sg container thermal 
processor unix [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.36.1 #2
Call Trace:
 <IRQ>  [<ffffffff8103b1ba>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff8103b291>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff812f4052>] dev_watchdog+0x222/0x230
 [<ffffffff81046b64>] run_timer_softirq+0x124/0x230
 [<ffffffff812f3e30>] ? dev_watchdog+0x0/0x230
 [<ffffffff81060f59>] ? clockevents_program_event+0x59/0xa0
 [<ffffffff81040dd7>] __do_softirq+0xa7/0x130
 [<ffffffff810587e3>] ? hrtimer_interrupt+0x133/0x240
 [<ffffffff8100314c>] call_softirq+0x1c/0x30
 [<ffffffff81004f4d>] do_softirq+0x4d/0x80
 [<ffffffff81040aed>] irq_exit+0x7d/0x90
 [<ffffffff8101c33b>] smp_apic_timer_interrupt+0x6b/0xa0
 [<ffffffff81002c13>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8100a3f2>] ? mwait_idle+0x72/0x90
 [<ffffffff810013b0>] ? enter_idle+0x20/0x30
 [<ffffffff81001419>] cpu_idle+0x59/0xb0
 [<ffffffff813437d8>] rest_init+0x68/0x70
 [<ffffffff814aecff>] start_kernel+0x34b/0x356
 [<ffffffff814ae31c>] x86_64_start_reservations+0x12c/0x130
 [<ffffffff814ae407>] x86_64_start_kernel+0xe7/0xee
---[ end trace 629a53efa268a4ba ]---
e1000e 0000:0f:00.0: eth0: Reset adapter
bonding: bond0: link status definitely down for interface eth0, disabling it
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
bonding: bond0: link status definitely up for interface eth0.

continuity ~ # cat /proc/interrupts 
           CPU0       CPU1       
  0:         45          1   IO-APIC-edge      timer
  1:          1          1   IO-APIC-edge      i8042
  4:       3269         33   IO-APIC-edge      serial
  8:         61         64   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          2          3   IO-APIC-edge      i8042
 16:      85111     324026   IO-APIC-fasteoi   pata_pdc2027x, uhci_hcd:usb1, 
uhci_hcd:usb3, ehci_hcd:usb8, eth1
 17:     482156       3274   IO-APIC-fasteoi   ahci, uhci_hcd:usb2, 
uhci_hcd:usb4, eth0
 18:        183        120   IO-APIC-fasteoi   uhci_hcd:usb5, ehci_hcd:usb7
 19:          0          0   IO-APIC-fasteoi   uhci_hcd:usb6
NMI:          0          0   Non-maskable interrupts
LOC:     200194     152902   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:      32585      37266   Rescheduling interrupts
CAL:         57         75   Function call interrupts
TLB:      18358      18379   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:          6          6   Machine check polls
ERR:          0
MIS:          0


continuity ~ # modinfo e1000|grep ^version
version:        7.3.21-k6-NAPI


continuity ~ # ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbag
        Wake-on: g
        Current message level: 0x00000001 (1)
        Link detected: yes
continuity ~ # ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbag
        Wake-on: g
        Current message level: 0x00000001 (1)
        Link detected: yes


continuity ~ # ethtool -i eth0
driver: e1000e
version: 1.2.7-k2
firmware-version: 1.0-2
bus-info: 0000:0f:00.0
continuity ~ # ethtool -i eth1
driver: e1000e
version: 1.2.7-k2
firmware-version: 1.0-2
bus-info: 0000:0d:00.0



continuity ~ # ethtool -i eth0
driver: e1000e
version: 1.2.7-k2
firmware-version: 1.0-2
bus-info: 0000:0f:00.0
continuity ~ # ethtool -i eth1
driver: e1000e
version: 1.2.7-k2
firmware-version: 1.0-2
bus-info: 0000:0d:00.0



continuity ~ # ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
continuity ~ # ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on



lspci -vv

0d:00.0 Ethernet controller: Intel Corporation 82573V Gigabit Ethernet 
Controller (Copper) (rev 03)
        Subsystem: Tyan Computer Device 5211
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at f4080000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at f4000000 (32-bit, non-prefetchable) [size=512K]
        Region 2: I/O ports at 3000 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, 
L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ 
TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown, 
Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- 
NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- 
ChkEn-
        Capabilities: [140] Device Serial Number 00-e0-81-ff-ff-b1-83-2e
        Kernel driver in use: e1000e
        Kernel modules: e1000e


0f:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet 
Controller
        Subsystem: Tyan Computer Device 5211
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at f4280000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at f4200000 (32-bit, non-prefetchable) [size=512K]
        Region 2: I/O ports at 4000 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, 
L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ 
TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown, 
Latency L0 <128ns, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- 
NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- 
ChkEn-
        Capabilities: [140] Device Serial Number 00-e0-81-ff-ff-b1-83-2f
        Kernel driver in use: e1000e
        Kernel modules: e1000e



continuity ~ # ifconfig
bond0     Link encap:Ethernet  HWaddr 00:e0:81:b1:83:2f  
          inet addr:192.168.0.2  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::2e0:81ff:feb1:832f/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:730984 errors:0 dropped:8386 overruns:0 frame:0
          TX packets:1134308 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:172475094 (164.4 MiB)  TX bytes:1466576683 (1.3 GiB)

eth0      Link encap:Ethernet  HWaddr 00:e0:81:b1:83:2f  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:672162 errors:0 dropped:8386 overruns:0 frame:0
          TX packets:4102 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:152695089 (145.6 MiB)  TX bytes:1622744 (1.5 MiB)
          Interrupt:17 Memory:f4280000-f42a0000 

eth1      Link encap:Ethernet  HWaddr 00:e0:81:b1:83:2f  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:58822 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1130206 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:19780005 (18.8 MiB)  TX bytes:1464953939 (1.3 GiB)
          Interrupt:16 Memory:f4080000-f40a0000 




Kernel cfg and more infos available upon request.

Bonding is properly configured and trunking enabled on the switch.
It also happens without bonding.


Any help appreciated.


Cheers,
Vasco

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to