The first step is to make sure everything is OK on the Supermicro end, which means checking with them for the latest updates to your BIOS, etc.
The second step, is to update the driver with the latest standalone driver from e1000.sourceforge.net which is currently ixgbe-5.0.4. If that doesn’t help, check the per-core CPU utilization and IRQ spread (/proc/interrupts | grep ethx). How much traffic are you seeing when this is happening and what are you communicating with? "sar –n DEV 1 10" should show the interface bandwidth and netstat should show the connections. And finally, I see a lot of issues with "current" Debian kernels, but this doesn't sound like one of those issues. If it was, I would suggest a newer stable kernel from kernel.org. Hope that helps. Todd Fujinaka Software Application Engineer Networking Division (ND) Intel Corporation todd.fujin...@intel.com (503) 712-4565 -----Original Message----- From: Johannes Truschnigg [mailto:johannes.truschn...@geizhals.at] Sent: Thursday, March 02, 2017 2:09 AM To: e1000-devel@lists.sourceforge.net Subject: [E1000-devel] Queue timeout problems with ixgbe/Intel X540-AT2 on Linux 4.9 Hello list, A machine we recently put into service is showing (presumably) Ethernet-related problems. The host is a Supermicro SYS-1028U-TNRT+ barebone with 256GB of ECC-RDIMM, 2x Intel Xeon E5-2660 v4 CPUs (24 Cores, HT disabled, BIOS dated 08/09/2016), and connected to a 1GBit switchport via one of its on-board X540-AT2-provided ports (PCIe link properties negotiated: Speed 5GT/s, Width x8). The machine's CPU-normalized load is about 1, so it is quite busy. Additional software/firmware info and NIC stats: # uname -a Linux inject 4.9.0-0.bpo.1-amd64 #1 SMP Debian 4.9.2-2~bpo8+1 (2017-01-26) x86_64 GNU/Linux # ethtool -i eth0 driver: ixgbe version: 4.4.0-k firmware-version: 0x800003e2 bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no # ethtool -S eth0 | grep -v ' 0$' NIC statistics: rx_packets: 273710944 tx_packets: 398971152 rx_bytes: 313480861463 tx_bytes: 470304591176 rx_pkts_nic: 273710875 tx_pkts_nic: 398971010 rx_bytes_nic: 314575702117 tx_bytes_nic: 471900519485 lsc_int: 5 rx_dropped: 56473 multicast: 58774 broadcast: 195115 fdir_match: 273920501 fdir_miss: 139668 fdir_overflow: 22 tx_timeout_count: 4 tx_restart_queue: 3 [omitting lines that merely detail [rx]x_queue_\d+_{bytes,packets} counters] Relevant debug ringbuffer contents: [40807.952873] ------------[ cut here ]------------ [40807.952921] WARNING: CPU: 18 PID: 15921 at /home/zumbi/linux-4.9.2/net/sched/sch_generic.c:316 dev_watchdog+0x220/0x230 [40807.952959] NETDEV WATCHDOG: eth0 (ixgbe): transmit queue 0 timed out [40807.952983] Modules linked in: tcp_diag inet_diag netconsole configfs ipmi_watchdog ast ttm drm_kms_helper drm i2c_algo_bit iTCO_wdt iTCO_vendor_support intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore pcspkr evdev joydev mei_me i2c_i801 lpc_ich intel_rapl_perf i2c_smbus mei ioatdma mfd_core shpchp wmi tpm_tis tpm_tis_core tpm acpi_power_meter acpi_pad button ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 crc16 jbd2 fscrypto mbcache raid0 raid1 md_mod hid_generic usbhid hid sg sd_mod crc32c_intel aesni_intel ahci aes_x86_64 glue_helper lrw libahci gf128mul ablk_helper cryptd xhci_pci libata xhci_hcd ehci_pci ehci_hcd ixgbe usbcore scsi_mod dca nvme ptp usb_common [40807.953445] pps_core nvme_core mdio fjes [40807.953467] CPU: 18 PID: 15921 Comm: inject Not tainted 4.9.0-0.bpo.1-amd64 #1 Debian 4.9.2-2~bpo8+1 [40807.953501] Hardware name: Supermicro SYS-1028U-TNRT+/X10DRU-i+, BIOS 2.0b 08/09/2016 [40807.953529] 0000000000000000 ffffffff8fd2a1f5 ffff9f977f303e38 0000000000000000 [40807.953564] ffffffff8fa77884 0000000000000000 ffff9f977f303e90 ffff9f776ec00000 [40807.953599] 0000000000000012 ffff9f776ec24f40 0000000000000040 ffffffff8fa778ff [40807.953633] Call Trace: [40807.953646] <IRQ> [40807.953665] [<ffffffff8fd2a1f5>] ? dump_stack+0x5c/0x77 [40807.953688] [<ffffffff8fa77884>] ? __warn+0xc4/0xe0 [40807.953708] [<ffffffff8fa778ff>] ? warn_slowpath_fmt+0x5f/0x80 [40807.953731] [<ffffffff8ff1fc30>] ? dev_watchdog+0x220/0x230 [40807.953753] [<ffffffff8ff1fa10>] ? dev_deactivate_queue.constprop.27+0x60/0x60 [40807.953784] [<ffffffff8fae6210>] ? call_timer_fn+0x30/0x130 [40807.953807] [<ffffffff8fae7085>] ? run_timer_softirq+0x215/0x4b0 [40807.953832] [<ffffffff8fd33434>] ? timerqueue_add+0x54/0xa0 [40807.953853] [<ffffffff8fae82c8>] ? enqueue_hrtimer+0x38/0x80 [40807.953878] [<ffffffff8fffcce6>] ? __do_softirq+0x106/0x292 [40807.953902] [<ffffffff8fb8eae0>] ? trace_event_raw_event_mm_lru_insertion+0x170/0x170 [40807.953931] [<ffffffff8fa7db08>] ? irq_exit+0x98/0xa0 [40807.953951] [<ffffffff8fffcaee>] ? smp_apic_timer_interrupt+0x3e/0x50 [40807.953977] [<ffffffff8fffbe02>] ? apic_timer_interrupt+0x82/0x90 [40807.953999] <EOI> [40807.954011] [<ffffffff8fb8eae0>] ? trace_event_raw_event_mm_lru_insertion+0x170/0x170 [40807.954039] [<ffffffff8fff9d91>] ? _raw_spin_unlock_irqrestore+0x11/0x20 [40807.954064] [<ffffffff8fb8fb4d>] ? pagevec_lru_move_fn+0xad/0xe0 [40807.954934] [<ffffffff8fb8fc6c>] ? __lru_cache_add+0x6c/0x90 [40807.955761] [<ffffffff8fbb769e>] ? handle_mm_fault+0x156e/0x1650 [40807.956582] [<ffffffff8fa5fe43>] ? __do_page_fault+0x253/0x510 [40807.957392] [<ffffffff8fffb598>] ? page_fault+0x28/0x30 [40807.958201] ---[ end trace faa12d1c7fa20cc5 ]--- [40807.959003] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout [40807.959849] ixgbe 0000:01:00.0 eth0: Reset adapter [40811.710212] ixgbe 0000:01:00.0 eth0: NIC Link is Up 1 Gbps, Flow Control: None [42470.998496] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout [42470.999465] ixgbe 0000:01:00.0 eth0: Reset adapter [42479.497773] ixgbe 0000:01:00.0 eth0: NIC Link is Up 1 Gbps, Flow Control: None [48475.363991] ixgbe 0000:01:00.0 eth0: initiating reset due to tx timeout [48475.365060] ixgbe 0000:01:00.0 eth0: Reset adapter Can you please help me determine what's the reason for this behaviour? Are there any specific ixgbe/NIC-specific tunables I should be looking into to fix it? If I need to supply additional data, please let me know. Please also keep me CC'd, as I'm not subscribed to this list. Thanks! -- Mit freundlichen Grüßen Johannes Truschnigg Technik / Senior System Administrator Geizhals (R) - Preisvergleich Preisvergleich Internet Services AG Obere Donaustraße 63/2 A-1020 Wien Tel: +43 1 5811609/87 Fax: +43 1 5811609/55 http://www.geizhals.at | http://www.geizhals.de | http://www.geizhals.eu http://www.facebook.com/geizhals => Geizhals auf Facebook! http://twitter.com/geizhals => Geizhals auf Twitter! http://blog.geizhals.at => Der Geizhals-Blog! http://unternehmen.geizhals.at/about/de/apps/ => Die Geizhals Mobile-App Handelsgericht Wien | FN 197241K | Firmensitz Wien ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired