On Mon, Mar 1, 2010 at 3:37 AM, Thrash Dude <thrash.d...@gmail.com> wrote: > Seems to be a rather common issue with the e1000 module. I searched the > archives back to 2005. Plenty of reports, no solutions.
There are some solutions, one of which is to try loading the driver with TxDescriptorStep=4 TxDescriptors=1024 > The NIC does drop the link, PC does not hang. The link does become active > again. Wouldn't be such an issue, although this PC is a file server for > streaming audio and video files exported across nfs and cifs shares. Quite > an annoying problem to get 55minutes into a movie to have the link die. for some of the recent times have you been streaming using cifs or NFS? what version of NFS? what client machine /os did you test with? What streaming software were you using to play the movie on the remote machine? > NOTE: No the link does not die with every movie. This seems to be > completely random. I can flood the _server_ with 15 incoming connections > continuously for 30 minutes and there's no problem. Or I can simply ping - > c4 server and receive a Tx Unit Hang. so maybe its not actually related to traffic levels? > Machine specs - > Slackware x86_64 -current > Pure Virgin Kernel 2.6.32.8 (have noticed issue with previous kernels) > 7GB Ram > AMD RS780 > > Migrated same card to another machine to rule out +4GB question that is > always. And another Chipset to test. > Intel P45, 2GB Ram - same issue This is actually a promising development because we might actually have something close to that system here. What slot did you plug in? what is the barcode number on your adapter? XXXXXX-XXX. The other (bad) option is that since the problem follows the adapter it could be the adapter. have you double checked cooling of the NIC? Do you have another identical NIC you can try? You can probably get warranty support for the one you have, to get a replacement. > VMware Player is currently installed. Issue presents itself when VMware is > removed and/or VMware modules are not loaded. > > See below for modinfo, dmesg, IRQ's, lspci and some ethtool output > > Partial dmesg > [43503.704198] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang > [43503.704199] Tx Queue <0> [43503.704200] TDH > <c7> [43503.704201] TDT <da> [43503.704201] > next_to_use <da> [43503.704202] next_to_clean <c8> > [43503.704202] buffer_info[next_to_clean] [43503.704203] time_stamp > <1029335c6> [43503.704203] next_to_watch <c9> [43503.704204] > jiffies <102933c78> [43503.704205] next_to_watch.status > <0> [43505.704209] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang > [43505.704211] Tx Queue <0> [43505.704212] TDH > <c7> [43505.704212] TDT <da> [43505.704213] > next_to_use <da> [43505.704214] next_to_clean <c8> > [43505.704214] buffer_info[next_to_clean] [43505.704215] time_stamp > <1029335c6> [43505.704215] next_to_watch <c9> [43505.704216] > jiffies <102934448> [43505.704216] next_to_watch.status > <0> [43507.704182] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang > [43507.704183] Tx Queue <0> [43507.704184] TDH > <c7> [43507.704185] TDT <da> [43507.704185] > next_to_use <da> [43507.704186] next_to_clean <c8> > [43507.704186] buffer_info[next_to_clean] [43507.704187] time_stamp > <1029335c6> [43507.704187] next_to_watch <c9> [43507.704188] > jiffies <102934c18> [43507.704189] next_to_watch.status > <0> wow, thats a mess, please fix your mail client next time. What I do see in the above is is appears to be a legitimate tx hang. We have some debug code you can run that can help us diagnose, would you be able to run that? > modinfo e1000|grep ^version > version: 7.3.21-k5-NAPI > > > ethtool eth0 > Settings for eth0: > Supported ports: [ TP ] > Supported link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Supports auto-negotiation: Yes > Advertised link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Advertised auto-negotiation: Yes > Speed: 1000Mb/s > Duplex: Full > Port: Twisted Pair > PHYAD: 0 > Transceiver: internal > Auto-negotiation: on > Supports Wake-on: umbg > Wake-on: g > Current message level: 0x00000007 (7) Link detected: yes > > > > ethtool -i eth0 > driver: e1000 > version: 7.3.21-k5-NAPI > firmware-version: N/A > bus-info: 0000:02:06.0 > > > > ethtool -g eth0 > Ring parameters for eth0: > Pre-set maximums: > RX: 4096 > RX Mini: 0 > RX Jumbo: 0 > TX: 4096 > Current hardware settings: > RX: 256 > RX Mini: 0 > RX Jumbo: 0 > TX: 256 > > > > ethtool -k eth0 > Offload parameters for eth0: > rx-checksumming: on > tx-checksumming: on > scatter-gather: on > tcp segmentation offload: on > udp fragmentation offload: off > generic segmentation offload: on > > > > > lspci -vv > 02:06.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet > Controller (rev 05) > Subsystem: Intel Corporation PRO/1000 GT Desktop Adapter Control: > I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- > <TAbort- <MAbort+ >SERR- <PERR- INTx- This MAbort+ means that a transaction that the 82541 initiated was aborted by the chipset for some reason. It is unusual to see this, and it co > Latency: 64 (63750ns min), Cache Line Size: 4 bytes Interrupt: pin > A routed to IRQ 20 > Region 0: Memory at fdec0000 (32-bit, non-prefetchable) > [size=128K] > Region 1: Memory at fdea0000 (32-bit, non-prefetchable) > [size=128K] > Region 2: I/O ports at df00 [size=64] [virtual] Expansion ROM at > fdf00000 [disabled] [size=128K] Capabilities: [dc] Power > Management version 2 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0 > +,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [e4] PCI-X non-bridge device > Command: DPERE- ERO+ RBC=512 OST=1 > Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple > DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- > Kernel driver in use: e1000 > Kernel modules: e1000 > > > > > cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 > 0: 129 0 374 2 IO-APIC-edge timer > 1: 0 0 2 0 IO-APIC-edge i8042 > 8: 0 0 17 0 IO-APIC-edge rtc0 > 9: 0 0 0 0 IO-APIC-fasteoi acpi > 12: 0 0 4 0 IO-APIC-edge i8042 > 14: 2 29 1067022 7488 IO-APIC-edge > pata_atiixp > 15: 0 0 0 0 IO-APIC-edge > pata_atiixp > 16: 2 80 5953112 20785 IO-APIC-fasteoi > ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel > 17: 0 0 4 0 IO-APIC-fasteoi > ehci_hcd:usb1 > 18: 0 80 2809310 15755 IO-APIC-fasteoi > ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, HDA Intel, nvidia > 19: 0 0 0 0 IO-APIC-fasteoi > ehci_hcd:usb2 > 20: 14 566 23527069 92044 IO-APIC-fasteoi eth0 > 22: 7 215 2493511 18964 IO-APIC-fasteoi > ahci, firewire_ohci > NMI: 0 0 0 0 Non-maskable interrupts > LOC: 66407712 37087457 32777253 29849036 Local timer interrupts > SPU: 0 0 0 0 Spurious interrupts > PMI: 0 0 0 0 Performance monitoring > interrupts > PND: 0 0 0 0 Performance pending > work > RES: 22471181 27170295 14818184 14992262 Rescheduling interrupts > CAL: 282347 218071 159212 205899 Function call > interrupts > TLB: 549650 571693 491287 469560 TLB shootdowns TRM: > 0 0 0 0 Thermal event interrupts > THR: 0 0 0 0 Threshold APIC > interrupts > MCE: 0 0 0 0 Machine check > exceptions > MCP: 376 376 376 376 Machine check polls > ERR: 0 > MIS: 0 > > > > > ifconfig eth0 > eth0 Link encap:Ethernet HWaddr 00:0e:0c:c2:82:04 > inet addr:192.168.1.109 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::20e:cff:fec2:8204/64 Scope:Link UP BROADCAST > RUNNING MULTICAST MTU:1500 Metric:1 RX packets:21429857 > errors:58 dropped:446 overruns:0 frame:29 TX packets:16592984 something is definitely strange here, errors, dropped, and frame being non zero is not normal either. > errors:0 dropped:0 overruns:0 carrier:0 collisions:0 > txqueuelen:1000 > RX bytes:26437196186 (24.6 GiB) TX bytes:10820406728 (10.0 GiB) > > > > If I upgrade to version e1000-8.0.19 Tx Unit Hang appears immediately > after the e1000 module is loaded. but does the part work at that point or is it completely dead? > > Using ethtool to turn off rx,tx,sg,tso, and gso things appear to work > better. But - in that case, a $5 r8169 performs just as well. > > Full dmesg in next post. Still waiting for the next post... please also include the output of ethtool -S eth0 after the next hang you get. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired