The first step is to make sure everything is OK on the Supermicro end, which 
means checking with them for the latest updates to your BIOS, etc.

The second step, is to update the driver with the latest standalone driver from 
e1000.sourceforge.net which is currently ixgbe-5.0.4.

If that doesn’t help, check the per-core CPU utilization and IRQ spread 
(/proc/interrupts | grep ethx).
How much traffic are you seeing when this is happening and what are you 
communicating with? "sar –n DEV 1 10" should show the interface bandwidth and 
netstat should show the connections.

And finally, I see a lot of issues with "current" Debian kernels, but this 
doesn't sound like one of those issues. If it was, I would suggest a newer 
stable kernel from kernel.org.

Hope that helps.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565


-----Original Message-----
From: Johannes Truschnigg [mailto:johannes.truschn...@geizhals.at] 
Sent: Thursday, March 02, 2017 2:09 AM
To: e1000-devel@lists.sourceforge.net
Subject: [E1000-devel] Queue timeout problems with ixgbe/Intel X540-AT2 on 
Linux 4.9

Hello list,

A machine we recently put into service is showing (presumably) Ethernet-related 
problems. The host is a Supermicro SYS-1028U-TNRT+ barebone with 256GB of 
ECC-RDIMM, 2x Intel Xeon E5-2660 v4 CPUs (24 Cores, HT disabled, BIOS dated 
08/09/2016), and connected to a 1GBit switchport via one of its on-board 
X540-AT2-provided ports (PCIe link properties negotiated: Speed 5GT/s, Width 
x8). The machine's CPU-normalized load is about 1, so it is quite busy.

Additional software/firmware info and NIC stats:

# uname -a
Linux inject 4.9.0-0.bpo.1-amd64 #1 SMP Debian 4.9.2-2~bpo8+1
(2017-01-26) x86_64 GNU/Linux


# ethtool -i eth0
driver: ixgbe
version: 4.4.0-k
firmware-version: 0x800003e2
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


# ethtool -S eth0 | grep -v ' 0$'
NIC statistics:
      rx_packets: 273710944
      tx_packets: 398971152
      rx_bytes: 313480861463
      tx_bytes: 470304591176
      rx_pkts_nic: 273710875
      tx_pkts_nic: 398971010
      rx_bytes_nic: 314575702117
      tx_bytes_nic: 471900519485
      lsc_int: 5
      rx_dropped: 56473
      multicast: 58774
      broadcast: 195115
      fdir_match: 273920501
      fdir_miss: 139668
      fdir_overflow: 22
      tx_timeout_count: 4
      tx_restart_queue: 3
[omitting lines that merely detail [rx]x_queue_\d+_{bytes,packets} counters]


Relevant debug ringbuffer contents:

[40807.952873] ------------[ cut here ]------------
[40807.952921] WARNING: CPU: 18 PID: 15921 at 
/home/zumbi/linux-4.9.2/net/sched/sch_generic.c:316 dev_watchdog+0x220/0x230
[40807.952959] NETDEV WATCHDOG: eth0 (ixgbe): transmit queue 0 timed out
[40807.952983] Modules linked in: tcp_diag inet_diag netconsole configfs 
ipmi_watchdog ast ttm drm_kms_helper drm i2c_algo_bit iTCO_wdt 
iTCO_vendor_support intel_rapl sb_edac edac_core x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore pcspkr evdev 
joydev mei_me i2c_i801 lpc_ich intel_rapl_perf i2c_smbus mei ioatdma 
mfd_core shpchp wmi tpm_tis tpm_tis_core tpm acpi_power_meter acpi_pad 
button ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 
crc16 jbd2 fscrypto mbcache raid0 raid1 md_mod hid_generic usbhid hid sg 
sd_mod crc32c_intel aesni_intel ahci aes_x86_64 glue_helper lrw libahci 
gf128mul ablk_helper cryptd xhci_pci libata xhci_hcd ehci_pci ehci_hcd 
ixgbe usbcore scsi_mod dca nvme ptp usb_common
[40807.953445]  pps_core nvme_core mdio fjes
[40807.953467] CPU: 18 PID: 15921 Comm: inject Not tainted 
4.9.0-0.bpo.1-amd64 #1 Debian 4.9.2-2~bpo8+1
[40807.953501] Hardware name: Supermicro SYS-1028U-TNRT+/X10DRU-i+, BIOS 
2.0b 08/09/2016
[40807.953529]  0000000000000000 ffffffff8fd2a1f5 ffff9f977f303e38 
0000000000000000
[40807.953564]  ffffffff8fa77884 0000000000000000 ffff9f977f303e90 
ffff9f776ec00000
[40807.953599]  0000000000000012 ffff9f776ec24f40 0000000000000040 
ffffffff8fa778ff
[40807.953633] Call Trace:
[40807.953646]  <IRQ>
[40807.953665]  [<ffffffff8fd2a1f5>] ? dump_stack+0x5c/0x77
[40807.953688]  [<ffffffff8fa77884>] ? __warn+0xc4/0xe0
[40807.953708]  [<ffffffff8fa778ff>] ? warn_slowpath_fmt+0x5f/0x80
[40807.953731]  [<ffffffff8ff1fc30>] ? dev_watchdog+0x220/0x230
[40807.953753]  [<ffffffff8ff1fa10>] ? 
dev_deactivate_queue.constprop.27+0x60/0x60
[40807.953784]  [<ffffffff8fae6210>] ? call_timer_fn+0x30/0x130
[40807.953807]  [<ffffffff8fae7085>] ? run_timer_softirq+0x215/0x4b0
[40807.953832]  [<ffffffff8fd33434>] ? timerqueue_add+0x54/0xa0
[40807.953853]  [<ffffffff8fae82c8>] ? enqueue_hrtimer+0x38/0x80
[40807.953878]  [<ffffffff8fffcce6>] ? __do_softirq+0x106/0x292
[40807.953902]  [<ffffffff8fb8eae0>] ? 
trace_event_raw_event_mm_lru_insertion+0x170/0x170
[40807.953931]  [<ffffffff8fa7db08>] ? irq_exit+0x98/0xa0
[40807.953951]  [<ffffffff8fffcaee>] ? smp_apic_timer_interrupt+0x3e/0x50
[40807.953977]  [<ffffffff8fffbe02>] ? apic_timer_interrupt+0x82/0x90
[40807.953999]  <EOI>
[40807.954011]  [<ffffffff8fb8eae0>] ? 
trace_event_raw_event_mm_lru_insertion+0x170/0x170
[40807.954039]  [<ffffffff8fff9d91>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[40807.954064]  [<ffffffff8fb8fb4d>] ? pagevec_lru_move_fn+0xad/0xe0
[40807.954934]  [<ffffffff8fb8fc6c>] ? __lru_cache_add+0x6c/0x90
[40807.955761]  [<ffffffff8fbb769e>] ? handle_mm_fault+0x156e/0x1650
[40807.956582]  [<ffffffff8fa5fe43>] ? __do_page_fault+0x253/0x510
[40807.957392]  [<ffffffff8fffb598>] ? page_fault+0x28/0x30
[40807.958201] ---[ end trace faa12d1c7fa20cc5 ]---
[40807.959003] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout
[40807.959849] ixgbe 0000:01:00.0 eth0: Reset adapter
[40811.710212] ixgbe 0000:01:00.0 eth0: NIC Link is Up 1 Gbps, Flow 
Control: None
[42470.998496] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout
[42470.999465] ixgbe 0000:01:00.0 eth0: Reset adapter
[42479.497773] ixgbe 0000:01:00.0 eth0: NIC Link is Up 1 Gbps, Flow 
Control: None
[48475.363991] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout
[48475.365060] ixgbe 0000:01:00.0 eth0: Reset adapter


Can you please help me determine what's the reason for this behaviour? 
Are there any specific ixgbe/NIC-specific tunables I should be looking 
into to fix it? If I need to supply additional data, please let me know.

Please also keep me CC'd, as I'm not subscribed to this list.

Thanks!

-- 
Mit freundlichen Grüßen

Johannes Truschnigg
Technik / Senior System Administrator

Geizhals (R) - Preisvergleich
Preisvergleich Internet Services AG
Obere Donaustraße 63/2
A-1020 Wien
Tel: +43 1 5811609/87
Fax: +43 1 5811609/55

http://www.geizhals.at | http://www.geizhals.de | http://www.geizhals.eu
http://www.facebook.com/geizhals              => Geizhals auf Facebook!
http://twitter.com/geizhals                   => Geizhals auf Twitter!
http://blog.geizhals.at                       => Der Geizhals-Blog!
http://unternehmen.geizhals.at/about/de/apps/ => Die Geizhals Mobile-App

Handelsgericht Wien | FN 197241K | Firmensitz Wien

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to