Hi,

>
>
> On Fri, 12 Feb 2010, Nishit Shah wrote:
>
>> Hi,
>>
>>                 I am getting Tx hangs with e1000e-1.0.15 driver.
>> Attached
>> logs below.
>
> Is there a chance you can try 1.1.2?  do you have jumbo frames enabled?

I will give a try on 1.1.2.
Jumbo frames are not enabled. (MTU of all eth* interfaces is 1500.)
Just to mention one thing, ipsec interfaces (ipsec0, ipsec1) have MTU of
16260.

>
>> Feb 10 06:05:11 1265762111 kernel: e1000: eth4: e1000_clean_tx_irq:
>> Detected
>> Tx Unit Hang
>> Feb 10 06:05:11 1265762111 kernel:   Tx Queue             <0>
>> Feb 10 06:05:11 1265762111 kernel:   TDH                  <e1>
>> Feb 10 06:05:11 1265762111 kernel:   TDT                  <cc>
>> Feb 10 06:05:11 1265762111 kernel:   next_to_use          <cc>
>> Feb 10 06:05:11 1265762111 kernel:   next_to_clean        <e0>
>> Feb 10 06:05:11 1265762111 kernel: buffer_info[next_to_clean]
>> Feb 10 06:05:11 1265762111 kernel:   time_stamp           <56300a18>
>> Feb 10 06:05:11 1265762111 kernel:   next_to_watch        <e4>
>> Feb 10 06:05:11 1265762111 kernel:   jiffies              <56300b51>
>> Feb 10 06:05:11 1265762111 kernel:   next_to_watch.status <0>
>> Feb 10 06:05:13 1265762113 kernel: e1000: eth4: e1000_clean_tx_irq:
>> Detected
>> Tx Unit Hang
>
> looks like something is really hanging.  If you turn off UDP checksum
> offload (and maybe scatter gather) with ethtool, does it start working?
>
> If this is reproducable, I would like to see the output of the e1000_dump
> routine at the time of the hang, but with 2048 descriptors it will be
> really huge (and probably overrun syslog).  I would need to prepare a
> version (or patch) of 1.0.15 or 1.1.2 with the e1000_dump code enabled.
>
> is it always the same interface?
>


Yes, it is reproducible. Even with 256 descriptor, I am able to reproduce
it. So, I think I am able to provide you the output of e1000_dump on
1.1.2. Meanwhile I will test the things with turning UDP checksum offload
and scatter gather off.


>>                 [r...@manage1 /root]# lspci_ether
>>
>> 05:00.0 Ethernet controller: Intel Corporation: Unknown device 105e (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_COPPER)
>>
>> 05:00.1 Ethernet controller: Intel Corporation: Unknown device 105e (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_COPPER)
>>
>> 06:00.0 Ethernet controller: Intel Corporation: Unknown device 105e (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_COPPER)
>>
>> 06:00.1 Ethernet controller: Intel Corporation: Unknown device 105e (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_COPPER)
>>
>> 07:00.0 Ethernet controller: Intel Corporation: Unknown device 105e (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_COPPER)
>>
>> 07:00.1 Ethernet controller: Intel Corporation: Unknown device 105e (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_COPPER)
>>
>> 08:00.0 Ethernet controller: Intel Corporation: Unknown device 105e (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_COPPER)
>>
>> 08:00.1 Ethernet controller: Intel Corporation: Unknown device 105e (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_COPPER)
>>
>> 0d:00.0 Ethernet controller: Intel Corporation: Unknown device 1096 (rev
>> 01)
>> - (E1000_DEV_ID_80003ES2LAN_COPPER_DPT)
>>
>> 0d:00.1 Ethernet controller: Intel Corporation: Unknown device 1096 (rev
>> 01)
>> - (E1000_DEV_ID_80003ES2LAN_COPPER_DPT)
>>
>> 0f:00.0 Ethernet controller: Intel Corporation: Unknown device 105f (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_FIBER)
>>
>> 0f:00.1 Ethernet controller: Intel Corporation: Unknown device 105f (rev
>> 06)
>> - (E1000_DEV_ID_82571EB_FIBER)
>
> you have a lot of ports in this machine, but that should be fine.
>
>>                 ethtool -g eth4
>>                                 Ring parameters for eth4:
>>
>> Pre-set maximums:
>> RX:             4096
>> RX Mini:        0
>> RX Jumbo:       0
>> TX:             4096
>> Current hardware settings:
>> RX:             2048
>> RX Mini:        0
>> RX Jumbo:       0
>> TX:             2048
>>
>>
>>                 ethtool -k eth4
>>
>> Offload parameters for eth4:
>> rx-checksumming: on
>
>> tx-checksumming: on
>> scatter-gather: on
>
> I know it will use more cpu but does the problem repro if you turn off the
> above two?
>
>> tcp segmentation offload: on
>> udp fragmentation offload: off
>> generic segmentation offload: off
>
>>
>>                 System Info:
>>
>>                                 Running kernel - 2.6.16.-13-1
>>                                 Openswan - 2.4.9 with klips
>>                                 cat /proc/interrupts
>>
>>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>> CPU6       CPU7
>>
>>   0:   40087329        273        274        274        274        273
>> 273        241    IO-APIC-edge  timer
>>
>>   2:          0          0          0          0          0          0
>> 0          0          XT-PIC  cascade
>>
>>   4:         10          0          0          0          1          0
>> 0          0    IO-APIC-edge  serial
>>
>>   8:       3393          1          0          0          0          0
>> 0          0    IO-APIC-edge  rtc
>>
>>  66:         63          0          0      80096          0          0
>> 0          0         PCI-MSI  eth0
>>
>>  74:         63          0          0      80096          0          0
>> 0          0         PCI-MSI  eth1
>>
>>  82:      80158          0          0          0          0          0
>> 0          0         PCI-MSI  eth2
>>
>>  90:      80158          0          0          0          0          0
>> 0          0         PCI-MSI  eth3
>>
>>  98:        256          0    5594913          0  168731027          0
>> 0          0         PCI-MSI  eth4
>>
>> 106:        130          0    6517103          0          0  255948447
>> 0          0         PCI-MSI  eth5
>>
>> 114:         64          0     100789          0          0          0
>> 0          0         PCI-MSI  eth6
>>
>> 122:         68          0      87466          0          0          0
>> 0          0         PCI-MSI  eth7
>>
>> 130:        252          0          0     466626          0          0
>> 0          0         PCI-MSI  eth8
>>
>> 138:      30033          0          0    4989635          0          0
>> 0          0         PCI-MSI  eth9
>>
>> 146:         62          0          0      80096          0          0
>> 0          0         PCI-MSI  eth10
>>
>> 153:     557669          0          1          0          0          0
>> 0          0   IO-APIC-level  libata
>>
>> 154:         62          0          0      80096          0          0
>> 0          0         PCI-MSI  eth11
>>
>> NMI:          0          0          0          0          0          0
>> 0          0
>>
>> LOC:   40086777   40087580   40087468   40087495   40083411   40083410
>> 40086663   40086021
>>
>> ERR:          0
>>
>> MIS:          0
>>
>>
>>
>>                 This machine is a IPSEC Gateway and we are using
>> openswan
>> 2.4.9 with klips for VPN.
>>
>>                 Possible suspect for this Hang is a Fragmented UDP
>> packet
>> coming/going on eth4 with datasize 32560 size over VPN tunnel. (eth4 <->
>> ipsec0 <-> eth5)
>>
>>                 Without VPN tunnel, I am not observing the hangs with
>> same
>> size of UDP packets.
>>
>>                 Let me know if you need more information on this.
>
> I think that is an extremely good clue.  Please try the experiment
> mentinoned above with disabling tx csum offload and tx sg.  The stack
> could be handing down a packet that is unusually long or formatted
> strangely that could hang up our offload setup for tx csum.
>
> Also are you running any traffic shaping via tc or netfilter rules?

Netfilter rules are applied. tc is there but we are not using it for VPN
traffic shapping.

So, will let you know the results with tx csum offload and tx sg disabled.

Rgds,
Nishit Shah.



------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to