Hello list,

A machine we recently put into service is showing (presumably) 
Ethernet-related problems. The host is a Supermicro SYS-1028U-TNRT+ 
barebone with 256GB of ECC-RDIMM, 2x Intel Xeon E5-2660 v4 CPUs (24 
Cores, HT disabled, BIOS dated 08/09/2016), and connected to a 1GBit 
switchport via one of its on-board X540-AT2-provided ports (PCIe link 
properties negotiated: Speed 5GT/s, Width x8). The machine's 
CPU-normalized load is about 1, so it is quite busy.

Additional software/firmware info and NIC stats:

# uname -a
Linux inject 4.9.0-0.bpo.1-amd64 #1 SMP Debian 4.9.2-2~bpo8+1 
(2017-01-26) x86_64 GNU/Linux


# ethtool -i eth0
driver: ixgbe
version: 4.4.0-k
firmware-version: 0x800003e2
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


# ethtool -S eth0 | grep -v ' 0$'
NIC statistics:
      rx_packets: 273710944
      tx_packets: 398971152
      rx_bytes: 313480861463
      tx_bytes: 470304591176
      rx_pkts_nic: 273710875
      tx_pkts_nic: 398971010
      rx_bytes_nic: 314575702117
      tx_bytes_nic: 471900519485
      lsc_int: 5
      rx_dropped: 56473
      multicast: 58774
      broadcast: 195115
      fdir_match: 273920501
      fdir_miss: 139668
      fdir_overflow: 22
      tx_timeout_count: 4
      tx_restart_queue: 3
[omitting lines that merely detail [rx]x_queue_\d+_{bytes,packets} counters]


Relevant debug ringbuffer contents:

[40807.952873] ------------[ cut here ]------------
[40807.952921] WARNING: CPU: 18 PID: 15921 at 
/home/zumbi/linux-4.9.2/net/sched/sch_generic.c:316 dev_watchdog+0x220/0x230
[40807.952959] NETDEV WATCHDOG: eth0 (ixgbe): transmit queue 0 timed out
[40807.952983] Modules linked in: tcp_diag inet_diag netconsole configfs 
ipmi_watchdog ast ttm drm_kms_helper drm i2c_algo_bit iTCO_wdt 
iTCO_vendor_support intel_rapl sb_edac edac_core x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore pcspkr evdev 
joydev mei_me i2c_i801 lpc_ich intel_rapl_perf i2c_smbus mei ioatdma 
mfd_core shpchp wmi tpm_tis tpm_tis_core tpm acpi_power_meter acpi_pad 
button ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 
crc16 jbd2 fscrypto mbcache raid0 raid1 md_mod hid_generic usbhid hid sg 
sd_mod crc32c_intel aesni_intel ahci aes_x86_64 glue_helper lrw libahci 
gf128mul ablk_helper cryptd xhci_pci libata xhci_hcd ehci_pci ehci_hcd 
ixgbe usbcore scsi_mod dca nvme ptp usb_common
[40807.953445]  pps_core nvme_core mdio fjes
[40807.953467] CPU: 18 PID: 15921 Comm: inject Not tainted 
4.9.0-0.bpo.1-amd64 #1 Debian 4.9.2-2~bpo8+1
[40807.953501] Hardware name: Supermicro SYS-1028U-TNRT+/X10DRU-i+, BIOS 
2.0b 08/09/2016
[40807.953529]  0000000000000000 ffffffff8fd2a1f5 ffff9f977f303e38 
0000000000000000
[40807.953564]  ffffffff8fa77884 0000000000000000 ffff9f977f303e90 
ffff9f776ec00000
[40807.953599]  0000000000000012 ffff9f776ec24f40 0000000000000040 
ffffffff8fa778ff
[40807.953633] Call Trace:
[40807.953646]  <IRQ>
[40807.953665]  [<ffffffff8fd2a1f5>] ? dump_stack+0x5c/0x77
[40807.953688]  [<ffffffff8fa77884>] ? __warn+0xc4/0xe0
[40807.953708]  [<ffffffff8fa778ff>] ? warn_slowpath_fmt+0x5f/0x80
[40807.953731]  [<ffffffff8ff1fc30>] ? dev_watchdog+0x220/0x230
[40807.953753]  [<ffffffff8ff1fa10>] ? 
dev_deactivate_queue.constprop.27+0x60/0x60
[40807.953784]  [<ffffffff8fae6210>] ? call_timer_fn+0x30/0x130
[40807.953807]  [<ffffffff8fae7085>] ? run_timer_softirq+0x215/0x4b0
[40807.953832]  [<ffffffff8fd33434>] ? timerqueue_add+0x54/0xa0
[40807.953853]  [<ffffffff8fae82c8>] ? enqueue_hrtimer+0x38/0x80
[40807.953878]  [<ffffffff8fffcce6>] ? __do_softirq+0x106/0x292
[40807.953902]  [<ffffffff8fb8eae0>] ? 
trace_event_raw_event_mm_lru_insertion+0x170/0x170
[40807.953931]  [<ffffffff8fa7db08>] ? irq_exit+0x98/0xa0
[40807.953951]  [<ffffffff8fffcaee>] ? smp_apic_timer_interrupt+0x3e/0x50
[40807.953977]  [<ffffffff8fffbe02>] ? apic_timer_interrupt+0x82/0x90
[40807.953999]  <EOI>
[40807.954011]  [<ffffffff8fb8eae0>] ? 
trace_event_raw_event_mm_lru_insertion+0x170/0x170
[40807.954039]  [<ffffffff8fff9d91>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[40807.954064]  [<ffffffff8fb8fb4d>] ? pagevec_lru_move_fn+0xad/0xe0
[40807.954934]  [<ffffffff8fb8fc6c>] ? __lru_cache_add+0x6c/0x90
[40807.955761]  [<ffffffff8fbb769e>] ? handle_mm_fault+0x156e/0x1650
[40807.956582]  [<ffffffff8fa5fe43>] ? __do_page_fault+0x253/0x510
[40807.957392]  [<ffffffff8fffb598>] ? page_fault+0x28/0x30
[40807.958201] ---[ end trace faa12d1c7fa20cc5 ]---
[40807.959003] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout
[40807.959849] ixgbe 0000:01:00.0 eth0: Reset adapter
[40811.710212] ixgbe 0000:01:00.0 eth0: NIC Link is Up 1 Gbps, Flow 
Control: None
[42470.998496] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout
[42470.999465] ixgbe 0000:01:00.0 eth0: Reset adapter
[42479.497773] ixgbe 0000:01:00.0 eth0: NIC Link is Up 1 Gbps, Flow 
Control: None
[48475.363991] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout
[48475.365060] ixgbe 0000:01:00.0 eth0: Reset adapter


Can you please help me determine what's the reason for this behaviour? 
Are there any specific ixgbe/NIC-specific tunables I should be looking 
into to fix it? If I need to supply additional data, please let me know.

Please also keep me CC'd, as I'm not subscribed to this list.

Thanks!

-- 
Mit freundlichen Grüßen

Johannes Truschnigg
Technik / Senior System Administrator

Geizhals (R) - Preisvergleich
Preisvergleich Internet Services AG
Obere Donaustraße 63/2
A-1020 Wien
Tel: +43 1 5811609/87
Fax: +43 1 5811609/55

http://www.geizhals.at | http://www.geizhals.de | http://www.geizhals.eu
http://www.facebook.com/geizhals              => Geizhals auf Facebook!
http://twitter.com/geizhals                   => Geizhals auf Twitter!
http://blog.geizhals.at                       => Der Geizhals-Blog!
http://unternehmen.geizhals.at/about/de/apps/ => Die Geizhals Mobile-App

Handelsgericht Wien | FN 197241K | Firmensitz Wien

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to