"PCIe link lost" is catastrophic and I would suggest you talk to the hardware manufacturer first.
You could also go through the git log to see if there were any updates to the PCIE subsystem, but if we can't talk to the part there's not much we can do from the network driver. Todd Fujinaka Software Application Engineer Data Center Group Intel Corporation todd.fujin...@intel.com -----Original Message----- From: Bret Towe <bret.t...@gmail.com> Sent: Friday, July 2, 2021 6:34 AM To: e1000-de...@lists.sf.net Subject: [E1000-devel] failed to read reg 0xc030 Hello, I've been seeing an issue for while, looking back at logs it started on June 1 on kernel 5.12.8 the visible effect is the server in question every couple of days it will lose network connectivity typically all 4 ports stop communicating, this last time however it only dropped 1 the trace below if from that event. this last crash was from 5.12.13 let me know what all you need to narrow down the problem [140519.425033] igb 0000:01:00.0 lan1: PCIe link lost [140519.425055] ------------[ cut here ]------------ [140519.425058] igb: Failed to read reg 0xc030! [140519.425151] WARNING: CPU: 3 PID: 802 at drivers/net/ethernet/intel/igb/igb_main.c:747 igb_rd32.cold+0x39/0x45 [igb] [140519.425201] Modules linked in: rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libcurve25519_generic libchacha libblake2s_generic ip6_udp_tunnel udp_tunnel amdgpu sch_cake gpu_sched act_mirred cls_u32 sch_ingress ifb sch_fq bridge ip6table_filter ip6_tables xt_nat xt_MASQUERADE iptable_nat nf_nat xt_state xt_conntrack nf_conntrack cfg80211 nf_defrag_ipv6 nf_defrag_ipv4 radeon libcrc32c xt_tcpudp iptable_filter rfkill 8021q garp mrp stp llc edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi drm_ttm_helper snd_hda_codec ttm irqbypass crct10dif_pclmul drm_kms_helper snd_hda_core ghash_clmulni_intel snd_hwdep pcspkr k10temp fam15h_power snd_pcm cec snd_timer igb sp5100_tco ccp snd syscopyarea rng_core sysfillrect i2c_algo_bit sysimgblt i2c_piix4 dca fb_sys_fops soundcore mac_hid [140519.425405] acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace drm sunrpc fuse agpgart nfs_ssc bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_mod crc32_pclmul crc32c_intel usbhid sdhci_pci aesni_intel cqhci sdhci xhci_pci crypto_simd mmc_core cryptd xhci_pci_renesas video [140519.425490] CPU: 3 PID: 802 Comm: snmpd Not tainted 5.12.13-arch1-2 #1 [140519.425499] Hardware name: CompuLab fitlet/fitlet, BIOS SBCFLTR_0.08.01 06/23/2016 [140519.425504] RIP: 0010:igb_rd32.cold+0x39/0x45 [igb] [140519.425544] Code: 48 c7 c6 8c 11 87 c0 e8 48 00 a2 ea 48 8b bb 30 ff ff ff e8 15 84 4f ea 84 c0 74 15 89 ee 48 c7 c7 68 1e 87 c0 e8 13 7e 9d ea <0f> 0b e9 5e 33 fe ff e9 73 33 fe ff 48 63 c6 89 f2 48 c7 c6 00 1f [140519.425551] RSP: 0018:ffffb519c0f57c68 EFLAGS: 00010286 [140519.425560] RAX: 0000000000000000 RBX: ffff8f5888874e90 RCX: 0000000000000027 [140519.425565] RDX: ffff8f5996d986e8 RSI: 0000000000000001 RDI: ffff8f5996d986e0 [140519.425571] RBP: 000000000000c030 R08: 0000000000000000 R09: ffffb519c0f57a98 [140519.425576] R10: ffffb519c0f57a90 R11: ffffffffac2cc4a8 R12: 00000000ffffffff [140519.425581] R13: 0000000000000000 R14: ffff8f588b03a240 R15: 000000000000c030 [140519.425587] FS: 00007f9fff6f5740(0000) GS:ffff8f5996d80000(0000) knlGS:0000000000000000 [140519.425594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [140519.425600] CR2: 00007f979d179f10 CR3: 0000000108bf0000 CR4: 00000000000406e0 [140519.425607] Call Trace: [140519.425624] igb_update_stats+0x71/0x810 [igb] [140519.425662] igb_get_stats64+0x2a/0x80 [igb] [140519.425697] dev_get_stats+0x5c/0xc0 [140519.425714] dev_seq_printf_stats+0x3e/0xe0 [140519.425731] dev_seq_show+0x10/0x30 [140519.425741] seq_read_iter+0x2d5/0x4c0 [140519.425756] seq_read+0x127/0x170 [140519.425770] proc_reg_read+0x55/0xa0 [140519.425781] vfs_read+0xa7/0x1a0 [140519.425794] ksys_read+0x67/0xe0 [140519.425806] do_syscall_64+0x33/0x40 [140519.425820] entry_SYSCALL_64_after_hwframe+0x44/0xae [140519.425832] RIP: 0033:0x7fa0000b2862 [140519.425841] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a 29 0a 00 e8 55 e4 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [140519.425847] RSP: 002b:00007fffd36627f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [140519.425856] RAX: ffffffffffffffda RBX: 000055a2aea72b30 RCX: 00007fa0000b2862 [140519.425861] RDX: 0000000000000400 RSI: 000055a2aeaa3720 RDI: 0000000000000008 [140519.425866] RBP: 00007fa000185300 R08: 0000000000000008 R09: 0000000000000000 [140519.425871] R10: 0000000000001000 R11: 0000000000000246 R12: 000055a2aea72b30 [140519.425876] R13: 0000000000000d68 R14: 00007fa000184700 R15: 0000000000000d68 [140519.425888] ---[ end trace 73fc28661e6b9864 ]--- [140525.940737] ------------[ cut here ]------------ [140525.940784] NETDEV WATCHDOG: lan1 (igb): transmit queue 1 timed out [140525.940842] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x25e/0x270 [140525.940867] Modules linked in: rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libcurve25519_generic libchacha libblake2s_generic ip6_udp_tunnel udp_tunnel amdgpu sch_cake gpu_sched act_mirred cls_u32 sch_ingress ifb sch_fq bridge ip6table_filter ip6_tables xt_nat xt_MASQUERADE iptable_nat nf_nat xt_state xt_conntrack nf_conntrack cfg80211 nf_defrag_ipv6 nf_defrag_ipv4 radeon libcrc32c xt_tcpudp iptable_filter rfkill 8021q garp mrp stp llc edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi drm_ttm_helper snd_hda_codec ttm irqbypass crct10dif_pclmul drm_kms_helper snd_hda_core ghash_clmulni_intel snd_hwdep pcspkr k10temp fam15h_power snd_pcm cec snd_timer igb sp5100_tco ccp snd syscopyarea rng_core sysfillrect i2c_algo_bit sysimgblt i2c_piix4 dca fb_sys_fops soundcore mac_hid [140525.941306] acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace drm sunrpc fuse agpgart nfs_ssc bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_mod crc32_pclmul crc32c_intel usbhid sdhci_pci aesni_intel cqhci sdhci xhci_pci crypto_simd mmc_core cryptd xhci_pci_renesas video [140525.941471] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.12.13-arch1-2 #1 [140525.941482] Hardware name: CompuLab fitlet/fitlet, BIOS SBCFLTR_0.08.01 06/23/2016 [140525.941489] RIP: 0010:dev_watchdog+0x25e/0x270 [140525.941504] Code: 67 40 73 ff eb 94 4c 89 f7 c6 05 be 94 2e 01 01 e8 d7 2e fa ff 44 89 e9 4c 89 f6 48 c7 c7 38 d3 c2 ab 48 89 c2 e8 82 21 17 00 <0f> 0b e9 72 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 [140525.941514] RSP: 0018:ffffb519c0003ea8 EFLAGS: 00010286 [140525.941538] RAX: 0000000000000000 RBX: ffff8f58833ae8c0 RCX: 0000000000000000 [140525.941546] RDX: ffff8f5996c28820 RSI: ffff8f5996c186e0 RDI: 0000000000000300 [140525.941553] RBP: ffff8f58888743dc R08: 0000000000000000 R09: ffffb519c0003cd8 [140525.941561] R10: ffffb519c0003cd0 R11: ffffffffac2cc4a8 R12: ffff8f5888874480 [140525.941568] R13: 0000000000000001 R14: ffff8f5888874000 R15: ffff8f58833ae940 [140525.941577] FS: 0000000000000000(0000) GS:ffff8f5996c00000(0000) knlGS:0000000000000000 [140525.941586] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [140525.941594] CR2: 00007fc64177a000 CR3: 0000000102aee000 CR4: 00000000000406f0 [140525.941603] Call Trace: [140525.941613] <IRQ> [140525.941623] ? pfifo_fast_reset+0x120/0x120 [140525.941637] ? pfifo_fast_reset+0x120/0x120 [140525.941650] call_timer_fn+0x29/0x130 [140525.941666] __run_timers+0x1ef/0x280 [140525.941683] run_timer_softirq+0x19/0x30 [140525.941695] __do_softirq+0xd0/0x2c1 [140525.941713] irq_exit_rcu+0x9e/0xd0 [140525.941725] sysvec_apic_timer_interrupt+0x72/0x90 [140525.941739] </IRQ> [140525.941746] asm_sysvec_apic_timer_interrupt+0x12/0x20 [140525.941759] RIP: 0010:native_safe_halt+0xe/0x10 [140525.941772] Code: c0 7b 01 00 f0 80 4a 02 20 48 8b 12 83 e2 08 75 c3 e9 7a ff ff ff cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 26 8a 56 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 16 8a 56 00 f4 c3 cc cc 0f 1f 44 00 [140525.941781] RSP: 0018:ffffffffac203e28 EFLAGS: 00000246 [140525.941793] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000002d600 [140525.941807] RDX: ffff8f5996c00000 RSI: ffff8f5880c30000 RDI: ffff8f5880c30064 [140525.941815] RBP: ffff8f5880c30064 R08: ffffffffac349f40 R09: 00007fcccb35f187 [140525.941822] R10: 00000000000004ec R11: 0000000000000091 R12: 0000000000000001 [140525.941830] R13: ffffffffac349fc0 R14: 0000000000000001 R15: 0000000000000000 [140525.941846] acpi_idle_do_entry+0x46/0x50 [140525.941858] acpi_idle_enter+0x86/0xc0 [140525.941873] cpuidle_enter_state+0x89/0x380 [140525.941891] cpuidle_enter+0x29/0x40 [140525.941906] do_idle+0x1da/0x270 [140525.941920] cpu_startup_entry+0x19/0x20 [140525.941933] start_kernel+0x871/0x896 [140525.941950] secondary_startup_64_no_verify+0xc2/0xcb [140525.941971] ---[ end trace 73fc28661e6b9865 ]--- [140525.942176] igb 0000:01:00.0 lan1: Reset adapter [140526.684545] br1: port 2(lan1.9) entered disabled state _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet