Re: [E1000-devel] TX driver issue detected, PF reset issued - debug results

Fujinaka, Todd Tue, 10 Mar 2015 18:20:29 -0700

I think this may be fixed with a newer NVM image. In any case, the newer NVM 
images will help with other issues including performance. Go to 
downloadcenter.intel.com and search for xl710 and you should find an updater 
and instructions.


Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565

-----Original Message-----
From: Dmitry Mikhaylov [mailto:d...@in-solve.ru] 
Sent: Monday, March 09, 2015 6:55 AM
To: e1000-devel@lists.sourceforge.net
Subject: [E1000-devel] TX driver issue detected, PF reset issued - debug results

Hi Team,

 

I would like to report my experience so somebody may have a real debugging 
shortcut next time. There is also a practical wish to developers.

 

We have moved to local 10G for storage based on X710 DA-2 two weeks ago.

Right from the start we have lots of these events:

 

kernel: [772391.372876] i40e 0000:07:00.1: TX driver issue detected, PF reset 
issued

kernel: [772391.594984] i40e 0000:07:00.1: i40e_ptp_init: added PHC on
enp7s0f1

kernel: [772391.741089] i40e 0000:07:00.1 enp7s0f1: NIC Link is Up 10 Gbps Full 
Duplex, Flow Control: None

kernel: [772391.752696] i40e 0000:07:00.1 enp7s0f1: NIC Link is Down

kernel: [772392.557943] i40e 0000:07:00.1 enp7s0f1: NIC Link is Up 10 Gbps Full 
Duplex, Flow Control: None

 

The server works normally, but there can be 500+ of these events In a row, 
giving 5 seconds working / 5 seconds not working experience. While the next 
hour we may have zero of these. But as a rule, 100+ events daily.

I shall to note that the driver resets just fine. No crashes, visible links or 
whatever. The reset method itself seems to be OK.

The problem is that the network performance in such a mode is surely 
unacceptable.

 

Initial setup was i40e
<http://sourceforge.net/projects/e1000/files/i40e%20stable/1.1.23/> 1.1.23 as 
of Intel official downloads. 

Our next try was with your
<http://sourceforge.net/projects/e1000/files/i40e%20stable/1.2.37/> 1.2.37 
released a few weeks ago because you said in the list that "something of that 
sort was fixed". Zero behavior change as it seems.

 

The problem however WAS solved with 

 

ethtool -K enp7s0f1 tso off

 

that attempt was based on http://sourceforge.net/p/e1000/bugs/407/ thread, 
related to completely other setup with different card and different driver, but 
still the problem seems the same (for years?) and the solution remains.

 

Working setup is here:

 

l31 ~ # ethtool -k enp7s0f1

Offload parameters for enp7s0f1:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp-segmentation-offload: off

udp-fragmentation-offload: off

generic-segmentation-offload: on

generic-receive-offload: on

large-receive-offload: off

rx-vlan-offload: on

tx-vlan-offload: on

ntuple-filters: on

receive-hashing: on

 

Not working setup differs in this only:

tcp-segmentation-offload: on

 

I would really suggest this problem be either fixed at last (not so
important) OR noted in some READMEs. That's really a show stopper and somebody 
with debugging karma and try-end-see luck may end up with non-working adapter.

Really I think that in local unified networks with no segmentation this feature 
is irrelevant. But if someone is facing this 10G card to internet and thus 
expecting some real segmentation, real pain in the performance side may arise 
with lack of the feature.

Maybe changing card defaults may be appropriate, although not sure about it as 
it is bad to change it in the middle of driver life cycle. But X710 is on the 
very start for now, so maybe?...

 

 

Other aux. info that may help:

 

We are using DA SFP+. One port of the card is attached.

Traffic is somehow floating from 1 to ~7 Gbps and the hang is NOT related to 
traffic, just peer TCP behavior (as it seems after solution).

We have only local network (used for iSCSI), JUMBO enabled with MTU9000; MTUs 
are fine, we have 0.0001% packet fragmentation. It seems that packet gets 
fragmented only on some really odd occasion.

No VLANs. No teaming. Very simple plain setup with two 10G switches, 10 hosts 
(5 here and 5 there) and SFP+ DA all the way (some errors and drops in the 
links "as usual", but not perceptible to end apps).

 

This exactly card in debug was in Intel R2312GZ4GCSAS server system.

The card is installed on the PCI raiser, together with some LSI-based Intel 
RAID on the same raiser.

 

enp7s0f1  Link encap:Ethernet  HWaddr 68:05:ca:30:5b:c9

          inet addr:10.21.0.87  Bcast:10.21.0.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1

          RX packets:520305857 errors:0 dropped:0 overruns:0 frame:0

          TX packets:1450598509 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:1797022578878 (1.6 TiB)  TX bytes:8995312721942 (8.1 TiB)

 

l31 ~ # ethtool enp7s0f1

Settings for enp7s0f1:

        Supported ports: [ FIBRE ]

        Supported link modes:   10000baseT/Full

        Supported pause frame use: Symmetric

        Supports auto-negotiation: No

        Advertised link modes:  Not reported

        Advertised pause frame use: No

        Advertised auto-negotiation: No

        Speed: 10000Mb/s

        Duplex: Full

        Port: Direct Attach Copper

        PHYAD: 0

        Transceiver: external

        Auto-negotiation: off

        Supports Wake-on: d

        Wake-on: d

        Current message level: 0x0000000f (15)

                               drv probe link timer

        Link detected: yes

 

l31 ~ # ethtool -i enp7s0f1

driver: i40e

version: 1.2.37

firmware-version: f4.22.27454 a1.2 n4.25 e143f

bus-info: 0000:07:00.1

supports-statistics: yes

supports-test: yes

supports-eeprom-access: yes

supports-register-dump: yes

 

l31 ~ # ethtool -k enp7s0f1

Offload parameters for enp7s0f1:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp-segmentation-offload: off

udp-fragmentation-offload: off

generic-segmentation-offload: on

generic-receive-offload: on

large-receive-offload: off

rx-vlan-offload: on

tx-vlan-offload: on

ntuple-filters: on

receive-hashing: on

 

I will add some more info if required to help you to understand things better 
if you wish.

 

Dmitry.


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] TX driver issue detected, PF reset issued - debug results

Reply via email to