Hi, > > > On Fri, 12 Feb 2010, Nishit Shah wrote: > >> Hi, >> >> I am getting Tx hangs with e1000e-1.0.15 driver. >> Attached >> logs below. > > Is there a chance you can try 1.1.2? do you have jumbo frames enabled?
I will give a try on 1.1.2. Jumbo frames are not enabled. (MTU of all eth* interfaces is 1500.) Just to mention one thing, ipsec interfaces (ipsec0, ipsec1) have MTU of 16260. > >> Feb 10 06:05:11 1265762111 kernel: e1000: eth4: e1000_clean_tx_irq: >> Detected >> Tx Unit Hang >> Feb 10 06:05:11 1265762111 kernel: Tx Queue <0> >> Feb 10 06:05:11 1265762111 kernel: TDH <e1> >> Feb 10 06:05:11 1265762111 kernel: TDT <cc> >> Feb 10 06:05:11 1265762111 kernel: next_to_use <cc> >> Feb 10 06:05:11 1265762111 kernel: next_to_clean <e0> >> Feb 10 06:05:11 1265762111 kernel: buffer_info[next_to_clean] >> Feb 10 06:05:11 1265762111 kernel: time_stamp <56300a18> >> Feb 10 06:05:11 1265762111 kernel: next_to_watch <e4> >> Feb 10 06:05:11 1265762111 kernel: jiffies <56300b51> >> Feb 10 06:05:11 1265762111 kernel: next_to_watch.status <0> >> Feb 10 06:05:13 1265762113 kernel: e1000: eth4: e1000_clean_tx_irq: >> Detected >> Tx Unit Hang > > looks like something is really hanging. If you turn off UDP checksum > offload (and maybe scatter gather) with ethtool, does it start working? > > If this is reproducable, I would like to see the output of the e1000_dump > routine at the time of the hang, but with 2048 descriptors it will be > really huge (and probably overrun syslog). I would need to prepare a > version (or patch) of 1.0.15 or 1.1.2 with the e1000_dump code enabled. > > is it always the same interface? > Yes, it is reproducible. Even with 256 descriptor, I am able to reproduce it. So, I think I am able to provide you the output of e1000_dump on 1.1.2. Meanwhile I will test the things with turning UDP checksum offload and scatter gather off. >> [r...@manage1 /root]# lspci_ether >> >> 05:00.0 Ethernet controller: Intel Corporation: Unknown device 105e (rev >> 06) >> - (E1000_DEV_ID_82571EB_COPPER) >> >> 05:00.1 Ethernet controller: Intel Corporation: Unknown device 105e (rev >> 06) >> - (E1000_DEV_ID_82571EB_COPPER) >> >> 06:00.0 Ethernet controller: Intel Corporation: Unknown device 105e (rev >> 06) >> - (E1000_DEV_ID_82571EB_COPPER) >> >> 06:00.1 Ethernet controller: Intel Corporation: Unknown device 105e (rev >> 06) >> - (E1000_DEV_ID_82571EB_COPPER) >> >> 07:00.0 Ethernet controller: Intel Corporation: Unknown device 105e (rev >> 06) >> - (E1000_DEV_ID_82571EB_COPPER) >> >> 07:00.1 Ethernet controller: Intel Corporation: Unknown device 105e (rev >> 06) >> - (E1000_DEV_ID_82571EB_COPPER) >> >> 08:00.0 Ethernet controller: Intel Corporation: Unknown device 105e (rev >> 06) >> - (E1000_DEV_ID_82571EB_COPPER) >> >> 08:00.1 Ethernet controller: Intel Corporation: Unknown device 105e (rev >> 06) >> - (E1000_DEV_ID_82571EB_COPPER) >> >> 0d:00.0 Ethernet controller: Intel Corporation: Unknown device 1096 (rev >> 01) >> - (E1000_DEV_ID_80003ES2LAN_COPPER_DPT) >> >> 0d:00.1 Ethernet controller: Intel Corporation: Unknown device 1096 (rev >> 01) >> - (E1000_DEV_ID_80003ES2LAN_COPPER_DPT) >> >> 0f:00.0 Ethernet controller: Intel Corporation: Unknown device 105f (rev >> 06) >> - (E1000_DEV_ID_82571EB_FIBER) >> >> 0f:00.1 Ethernet controller: Intel Corporation: Unknown device 105f (rev >> 06) >> - (E1000_DEV_ID_82571EB_FIBER) > > you have a lot of ports in this machine, but that should be fine. > >> ethtool -g eth4 >> Ring parameters for eth4: >> >> Pre-set maximums: >> RX: 4096 >> RX Mini: 0 >> RX Jumbo: 0 >> TX: 4096 >> Current hardware settings: >> RX: 2048 >> RX Mini: 0 >> RX Jumbo: 0 >> TX: 2048 >> >> >> ethtool -k eth4 >> >> Offload parameters for eth4: >> rx-checksumming: on > >> tx-checksumming: on >> scatter-gather: on > > I know it will use more cpu but does the problem repro if you turn off the > above two? > >> tcp segmentation offload: on >> udp fragmentation offload: off >> generic segmentation offload: off > >> >> System Info: >> >> Running kernel - 2.6.16.-13-1 >> Openswan - 2.4.9 with klips >> cat /proc/interrupts >> >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 >> CPU6 CPU7 >> >> 0: 40087329 273 274 274 274 273 >> 273 241 IO-APIC-edge timer >> >> 2: 0 0 0 0 0 0 >> 0 0 XT-PIC cascade >> >> 4: 10 0 0 0 1 0 >> 0 0 IO-APIC-edge serial >> >> 8: 3393 1 0 0 0 0 >> 0 0 IO-APIC-edge rtc >> >> 66: 63 0 0 80096 0 0 >> 0 0 PCI-MSI eth0 >> >> 74: 63 0 0 80096 0 0 >> 0 0 PCI-MSI eth1 >> >> 82: 80158 0 0 0 0 0 >> 0 0 PCI-MSI eth2 >> >> 90: 80158 0 0 0 0 0 >> 0 0 PCI-MSI eth3 >> >> 98: 256 0 5594913 0 168731027 0 >> 0 0 PCI-MSI eth4 >> >> 106: 130 0 6517103 0 0 255948447 >> 0 0 PCI-MSI eth5 >> >> 114: 64 0 100789 0 0 0 >> 0 0 PCI-MSI eth6 >> >> 122: 68 0 87466 0 0 0 >> 0 0 PCI-MSI eth7 >> >> 130: 252 0 0 466626 0 0 >> 0 0 PCI-MSI eth8 >> >> 138: 30033 0 0 4989635 0 0 >> 0 0 PCI-MSI eth9 >> >> 146: 62 0 0 80096 0 0 >> 0 0 PCI-MSI eth10 >> >> 153: 557669 0 1 0 0 0 >> 0 0 IO-APIC-level libata >> >> 154: 62 0 0 80096 0 0 >> 0 0 PCI-MSI eth11 >> >> NMI: 0 0 0 0 0 0 >> 0 0 >> >> LOC: 40086777 40087580 40087468 40087495 40083411 40083410 >> 40086663 40086021 >> >> ERR: 0 >> >> MIS: 0 >> >> >> >> This machine is a IPSEC Gateway and we are using >> openswan >> 2.4.9 with klips for VPN. >> >> Possible suspect for this Hang is a Fragmented UDP >> packet >> coming/going on eth4 with datasize 32560 size over VPN tunnel. (eth4 <-> >> ipsec0 <-> eth5) >> >> Without VPN tunnel, I am not observing the hangs with >> same >> size of UDP packets. >> >> Let me know if you need more information on this. > > I think that is an extremely good clue. Please try the experiment > mentinoned above with disabling tx csum offload and tx sg. The stack > could be handing down a packet that is unusually long or formatted > strangely that could hang up our offload setup for tx csum. > > Also are you running any traffic shaping via tc or netfilter rules? Netfilter rules are applied. tc is there but we are not using it for VPN traffic shapping. So, will let you know the results with tx csum offload and tx sg disabled. Rgds, Nishit Shah. ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
