Hello all, as I already reported exactly one year ago I am still experiencing Tx Unit hangs on two 82573 NICs on a TYAN Toledo i3210W S5211 (4GB RAM) running Gentoo 64 Bit.
The box is running since May 2008 and has shown continuous Tx Unit hangs since then. The vanilla kernel was updated quarterly from 2.6.29.x upon install up to 2.6.36.1 currently. I also replaced the unmanaged switch with a managed one but without any improvement. Box was freshly rebootet. Sometimes eth0 hangs, sometimes eth1 does. dmesg excerpt: e1000e 0000:0f:00.0: eth0: Detected Hardware Unit Hang: TDH <1> TDT <6f> next_to_use <6f> next_to_clean <ff> buffer_info[next_to_clean]: time_stamp <10001ae38> next_to_watch <1> jiffies <10001b747> next_to_watch.status <0> MAC Status <80080793> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10> e1000e 0000:0f:00.0: eth0: Detected Hardware Unit Hang: TDH <1> TDT <6f> next_to_use <6f> next_to_clean <ff> buffer_info[next_to_clean]: time_stamp <10001ae38> next_to_watch <1> jiffies <10001bb2f> next_to_watch.status <0> MAC Status <80080793> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10> ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0x222/0x230() Hardware name: empty NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Modules linked in: md5 xts gf128mul tun nfsd lockd nfs_acl auth_rpcgss sunrpc af_packet bonding ipv6 coretemp lm85 hwmon_vid hwmon ext4 mbcache jbd2 crc16 ansi_cprng krng eseqiv rng aes_x86_64 aes_generic cbc cryptomgr aead dm_crypt cryptd crypto_hash crypto_wq crypto_blkcipher crypto_algapi fan cpufreq_ondemand acpi_cpufreq freq_table mperf msr cpuid usbhid hid 8250_pnp ehci_hcd uhci_hcd e1000e usbcore 8250 rtc_cmos serial_core rtc_core i2c_i801 nls_base rtc_lib i2c_core psmouse evdev pcspkr button sg container thermal processor unix [last unloaded: microcode] Pid: 0, comm: swapper Not tainted 2.6.36.1 #2 Call Trace: <IRQ> [<ffffffff8103b1ba>] warn_slowpath_common+0x7a/0xb0 [<ffffffff8103b291>] warn_slowpath_fmt+0x41/0x50 [<ffffffff812f4052>] dev_watchdog+0x222/0x230 [<ffffffff81046b64>] run_timer_softirq+0x124/0x230 [<ffffffff812f3e30>] ? dev_watchdog+0x0/0x230 [<ffffffff81060f59>] ? clockevents_program_event+0x59/0xa0 [<ffffffff81040dd7>] __do_softirq+0xa7/0x130 [<ffffffff810587e3>] ? hrtimer_interrupt+0x133/0x240 [<ffffffff8100314c>] call_softirq+0x1c/0x30 [<ffffffff81004f4d>] do_softirq+0x4d/0x80 [<ffffffff81040aed>] irq_exit+0x7d/0x90 [<ffffffff8101c33b>] smp_apic_timer_interrupt+0x6b/0xa0 [<ffffffff81002c13>] apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff8100a3f2>] ? mwait_idle+0x72/0x90 [<ffffffff810013b0>] ? enter_idle+0x20/0x30 [<ffffffff81001419>] cpu_idle+0x59/0xb0 [<ffffffff813437d8>] rest_init+0x68/0x70 [<ffffffff814aecff>] start_kernel+0x34b/0x356 [<ffffffff814ae31c>] x86_64_start_reservations+0x12c/0x130 [<ffffffff814ae407>] x86_64_start_kernel+0xe7/0xee ---[ end trace 629a53efa268a4ba ]--- e1000e 0000:0f:00.0: eth0: Reset adapter bonding: bond0: link status definitely down for interface eth0, disabling it e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX bonding: bond0: link status definitely up for interface eth0. continuity ~ # cat /proc/interrupts CPU0 CPU1 0: 45 1 IO-APIC-edge timer 1: 1 1 IO-APIC-edge i8042 4: 3269 33 IO-APIC-edge serial 8: 61 64 IO-APIC-edge rtc0 9: 0 0 IO-APIC-fasteoi acpi 12: 2 3 IO-APIC-edge i8042 16: 85111 324026 IO-APIC-fasteoi pata_pdc2027x, uhci_hcd:usb1, uhci_hcd:usb3, ehci_hcd:usb8, eth1 17: 482156 3274 IO-APIC-fasteoi ahci, uhci_hcd:usb2, uhci_hcd:usb4, eth0 18: 183 120 IO-APIC-fasteoi uhci_hcd:usb5, ehci_hcd:usb7 19: 0 0 IO-APIC-fasteoi uhci_hcd:usb6 NMI: 0 0 Non-maskable interrupts LOC: 200194 152902 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 0 0 Performance monitoring interrupts PND: 0 0 Performance pending work RES: 32585 37266 Rescheduling interrupts CAL: 57 75 Function call interrupts TLB: 18358 18379 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 6 6 Machine check polls ERR: 0 MIS: 0 continuity ~ # modinfo e1000|grep ^version version: 7.3.21-k6-NAPI continuity ~ # ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbag Wake-on: g Current message level: 0x00000001 (1) Link detected: yes continuity ~ # ethtool eth1 Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbag Wake-on: g Current message level: 0x00000001 (1) Link detected: yes continuity ~ # ethtool -i eth0 driver: e1000e version: 1.2.7-k2 firmware-version: 1.0-2 bus-info: 0000:0f:00.0 continuity ~ # ethtool -i eth1 driver: e1000e version: 1.2.7-k2 firmware-version: 1.0-2 bus-info: 0000:0d:00.0 continuity ~ # ethtool -i eth0 driver: e1000e version: 1.2.7-k2 firmware-version: 1.0-2 bus-info: 0000:0f:00.0 continuity ~ # ethtool -i eth1 driver: e1000e version: 1.2.7-k2 firmware-version: 1.0-2 bus-info: 0000:0d:00.0 continuity ~ # ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: on continuity ~ # ethtool -k eth1 Offload parameters for eth1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: on lspci -vv 0d:00.0 Ethernet controller: Intel Corporation 82573V Gigabit Ethernet Controller (Copper) (rev 03) Subsystem: Tyan Computer Device 5211 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at f4080000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f4000000 (32-bit, non-prefetchable) [size=512K] Region 2: I/O ports at 3000 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown, Latency L0 <128ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140] Device Serial Number 00-e0-81-ff-ff-b1-83-2e Kernel driver in use: e1000e Kernel modules: e1000e 0f:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller Subsystem: Tyan Computer Device 5211 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at f4280000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f4200000 (32-bit, non-prefetchable) [size=512K] Region 2: I/O ports at 4000 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown, Latency L0 <128ns, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140] Device Serial Number 00-e0-81-ff-ff-b1-83-2f Kernel driver in use: e1000e Kernel modules: e1000e continuity ~ # ifconfig bond0 Link encap:Ethernet HWaddr 00:e0:81:b1:83:2f inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::2e0:81ff:feb1:832f/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:730984 errors:0 dropped:8386 overruns:0 frame:0 TX packets:1134308 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:172475094 (164.4 MiB) TX bytes:1466576683 (1.3 GiB) eth0 Link encap:Ethernet HWaddr 00:e0:81:b1:83:2f UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:672162 errors:0 dropped:8386 overruns:0 frame:0 TX packets:4102 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:152695089 (145.6 MiB) TX bytes:1622744 (1.5 MiB) Interrupt:17 Memory:f4280000-f42a0000 eth1 Link encap:Ethernet HWaddr 00:e0:81:b1:83:2f UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:58822 errors:0 dropped:0 overruns:0 frame:0 TX packets:1130206 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:19780005 (18.8 MiB) TX bytes:1464953939 (1.3 GiB) Interrupt:16 Memory:f4080000-f40a0000 Kernel cfg and more infos available upon request. Bonding is properly configured and trunking enabled on the switch. It also happens without bonding. Any help appreciated. Cheers, Vasco ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired