Hello all,
as I already reported exactly one year ago I am still experiencing Tx Unit
hangs on two 82573 NICs on a TYAN Toledo i3210W S5211 (4GB RAM) running Gentoo
64 Bit.
The box is running since May 2008 and has shown continuous Tx Unit hangs since
then. The vanilla kernel was updated quarterly from 2.6.29.x upon install up
to 2.6.36.1 currently.
I also replaced the unmanaged switch with a managed one but without any
improvement.
Box was freshly rebootet.
Sometimes eth0 hangs, sometimes eth1 does.
dmesg excerpt:
e1000e 0000:0f:00.0: eth0: Detected Hardware Unit Hang:
TDH <1>
TDT <6f>
next_to_use <6f>
next_to_clean <ff>
buffer_info[next_to_clean]:
time_stamp <10001ae38>
next_to_watch <1>
jiffies <10001b747>
next_to_watch.status <0>
MAC Status <80080793>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
PCI Status <10>
e1000e 0000:0f:00.0: eth0: Detected Hardware Unit Hang:
TDH <1>
TDT <6f>
next_to_use <6f>
next_to_clean <ff>
buffer_info[next_to_clean]:
time_stamp <10001ae38>
next_to_watch <1>
jiffies <10001bb2f>
next_to_watch.status <0>
MAC Status <80080793>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
PCI Status <10>
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0x222/0x230()
Hardware name: empty
NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Modules linked in: md5 xts gf128mul tun nfsd lockd nfs_acl auth_rpcgss sunrpc
af_packet bonding ipv6 coretemp lm85 hwmon_vid hwmon ext4 mbcache jbd2 crc16
ansi_cprng krng eseqiv rng aes_x86_64 aes_generic cbc cryptomgr aead dm_crypt
cryptd crypto_hash crypto_wq crypto_blkcipher crypto_algapi fan
cpufreq_ondemand acpi_cpufreq freq_table mperf msr cpuid usbhid hid 8250_pnp
ehci_hcd uhci_hcd e1000e usbcore 8250 rtc_cmos serial_core rtc_core i2c_i801
nls_base rtc_lib i2c_core psmouse evdev pcspkr button sg container thermal
processor unix [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.36.1 #2
Call Trace:
<IRQ> [<ffffffff8103b1ba>] warn_slowpath_common+0x7a/0xb0
[<ffffffff8103b291>] warn_slowpath_fmt+0x41/0x50
[<ffffffff812f4052>] dev_watchdog+0x222/0x230
[<ffffffff81046b64>] run_timer_softirq+0x124/0x230
[<ffffffff812f3e30>] ? dev_watchdog+0x0/0x230
[<ffffffff81060f59>] ? clockevents_program_event+0x59/0xa0
[<ffffffff81040dd7>] __do_softirq+0xa7/0x130
[<ffffffff810587e3>] ? hrtimer_interrupt+0x133/0x240
[<ffffffff8100314c>] call_softirq+0x1c/0x30
[<ffffffff81004f4d>] do_softirq+0x4d/0x80
[<ffffffff81040aed>] irq_exit+0x7d/0x90
[<ffffffff8101c33b>] smp_apic_timer_interrupt+0x6b/0xa0
[<ffffffff81002c13>] apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff8100a3f2>] ? mwait_idle+0x72/0x90
[<ffffffff810013b0>] ? enter_idle+0x20/0x30
[<ffffffff81001419>] cpu_idle+0x59/0xb0
[<ffffffff813437d8>] rest_init+0x68/0x70
[<ffffffff814aecff>] start_kernel+0x34b/0x356
[<ffffffff814ae31c>] x86_64_start_reservations+0x12c/0x130
[<ffffffff814ae407>] x86_64_start_kernel+0xe7/0xee
---[ end trace 629a53efa268a4ba ]---
e1000e 0000:0f:00.0: eth0: Reset adapter
bonding: bond0: link status definitely down for interface eth0, disabling it
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
bonding: bond0: link status definitely up for interface eth0.
continuity ~ # cat /proc/interrupts
CPU0 CPU1
0: 45 1 IO-APIC-edge timer
1: 1 1 IO-APIC-edge i8042
4: 3269 33 IO-APIC-edge serial
8: 61 64 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 2 3 IO-APIC-edge i8042
16: 85111 324026 IO-APIC-fasteoi pata_pdc2027x, uhci_hcd:usb1,
uhci_hcd:usb3, ehci_hcd:usb8, eth1
17: 482156 3274 IO-APIC-fasteoi ahci, uhci_hcd:usb2,
uhci_hcd:usb4, eth0
18: 183 120 IO-APIC-fasteoi uhci_hcd:usb5, ehci_hcd:usb7
19: 0 0 IO-APIC-fasteoi uhci_hcd:usb6
NMI: 0 0 Non-maskable interrupts
LOC: 200194 152902 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
PND: 0 0 Performance pending work
RES: 32585 37266 Rescheduling interrupts
CAL: 57 75 Function call interrupts
TLB: 18358 18379 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 6 6 Machine check polls
ERR: 0
MIS: 0
continuity ~ # modinfo e1000|grep ^version
version: 7.3.21-k6-NAPI
continuity ~ # ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbag
Wake-on: g
Current message level: 0x00000001 (1)
Link detected: yes
continuity ~ # ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbag
Wake-on: g
Current message level: 0x00000001 (1)
Link detected: yes
continuity ~ # ethtool -i eth0
driver: e1000e
version: 1.2.7-k2
firmware-version: 1.0-2
bus-info: 0000:0f:00.0
continuity ~ # ethtool -i eth1
driver: e1000e
version: 1.2.7-k2
firmware-version: 1.0-2
bus-info: 0000:0d:00.0
continuity ~ # ethtool -i eth0
driver: e1000e
version: 1.2.7-k2
firmware-version: 1.0-2
bus-info: 0000:0f:00.0
continuity ~ # ethtool -i eth1
driver: e1000e
version: 1.2.7-k2
firmware-version: 1.0-2
bus-info: 0000:0d:00.0
continuity ~ # ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
continuity ~ # ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
lspci -vv
0d:00.0 Ethernet controller: Intel Corporation 82573V Gigabit Ethernet
Controller (Copper) (rev 03)
Subsystem: Tyan Computer Device 5211
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f4080000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at f4000000 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at 3000 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns,
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown,
Latency L0 <128ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP+ Rollover- Timeout-
NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr-
AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap-
ChkEn-
Capabilities: [140] Device Serial Number 00-e0-81-ff-ff-b1-83-2e
Kernel driver in use: e1000e
Kernel modules: e1000e
0f:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
Controller
Subsystem: Tyan Computer Device 5211
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at f4280000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at f4200000 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at 4000 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns,
L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown,
Latency L0 <128ns, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP+ Rollover- Timeout-
NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr-
AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap-
ChkEn-
Capabilities: [140] Device Serial Number 00-e0-81-ff-ff-b1-83-2f
Kernel driver in use: e1000e
Kernel modules: e1000e
continuity ~ # ifconfig
bond0 Link encap:Ethernet HWaddr 00:e0:81:b1:83:2f
inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::2e0:81ff:feb1:832f/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:730984 errors:0 dropped:8386 overruns:0 frame:0
TX packets:1134308 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:172475094 (164.4 MiB) TX bytes:1466576683 (1.3 GiB)
eth0 Link encap:Ethernet HWaddr 00:e0:81:b1:83:2f
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:672162 errors:0 dropped:8386 overruns:0 frame:0
TX packets:4102 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:152695089 (145.6 MiB) TX bytes:1622744 (1.5 MiB)
Interrupt:17 Memory:f4280000-f42a0000
eth1 Link encap:Ethernet HWaddr 00:e0:81:b1:83:2f
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:58822 errors:0 dropped:0 overruns:0 frame:0
TX packets:1130206 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:19780005 (18.8 MiB) TX bytes:1464953939 (1.3 GiB)
Interrupt:16 Memory:f4080000-f40a0000
Kernel cfg and more infos available upon request.
Bonding is properly configured and trunking enabled on the switch.
It also happens without bonding.
Any help appreciated.
Cheers,
Vasco
------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired