"PCIe link lost" is catastrophic and I would suggest you talk to the hardware 
manufacturer first.

You could also go through the git log to see if there were any updates to the 
PCIE subsystem, but if we can't talk to the part there's not much we can do 
from the network driver.

Todd Fujinaka
Software Application Engineer
Data Center Group
Intel Corporation
todd.fujin...@intel.com

-----Original Message-----
From: Bret Towe <bret.t...@gmail.com> 
Sent: Friday, July 2, 2021 6:34 AM
To: e1000-de...@lists.sf.net
Subject: [E1000-devel] failed to read reg 0xc030

Hello,
I've been seeing an issue for while, looking back at logs it started on June 1 
on kernel 5.12.8 the visible effect is the server in question every couple of 
days it will lose network connectivity typically all 4 ports stop 
communicating, this last time however it only dropped 1 the trace below if from 
that event.
this last crash was from 5.12.13
let me know what all you need to narrow down the problem

[140519.425033] igb 0000:01:00.0 lan1: PCIe link lost [140519.425055] 
------------[ cut here ]------------ [140519.425058] igb: Failed to read reg 
0xc030!
[140519.425151] WARNING: CPU: 3 PID: 802 at
drivers/net/ethernet/intel/igb/igb_main.c:747 igb_rd32.cold+0x39/0x45 [igb] 
[140519.425201] Modules linked in: rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm 
ib_core wireguard curve25519_x86_64 libchacha20poly1305
chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libcurve25519_generic 
libchacha libblake2s_generic ip6_udp_tunnel udp_tunnel amdgpu sch_cake 
gpu_sched act_mirred cls_u32 sch_ingress ifb sch_fq bridge ip6table_filter 
ip6_tables xt_nat xt_MASQUERADE iptable_nat nf_nat xt_state xt_conntrack 
nf_conntrack cfg80211
nf_defrag_ipv6 nf_defrag_ipv4 radeon libcrc32c xt_tcpudp iptable_filter rfkill 
8021q garp mrp stp llc edac_mce_amd kvm_amd snd_hda_codec_realtek 
snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm snd_hda_intel 
snd_intel_dspcfg snd_intel_sdw_acpi drm_ttm_helper snd_hda_codec ttm irqbypass 
crct10dif_pclmul drm_kms_helper snd_hda_core ghash_clmulni_intel snd_hwdep 
pcspkr k10temp fam15h_power snd_pcm cec snd_timer igb sp5100_tco ccp snd 
syscopyarea rng_core sysfillrect i2c_algo_bit sysimgblt i2c_piix4 dca 
fb_sys_fops soundcore mac_hid [140519.425405]  acpi_cpufreq nfsd auth_rpcgss 
nfs_acl lockd grace drm sunrpc fuse agpgart nfs_ssc bpf_preload ip_tables 
x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_mod crc32_pclmul 
crc32c_intel usbhid sdhci_pci aesni_intel cqhci sdhci xhci_pci crypto_simd 
mmc_core cryptd xhci_pci_renesas video [140519.425490] CPU: 3 PID: 802 Comm: 
snmpd Not tainted 5.12.13-arch1-2 #1 [140519.425499] Hardware name: CompuLab 
fitlet/fitlet, BIOS
SBCFLTR_0.08.01 06/23/2016
[140519.425504] RIP: 0010:igb_rd32.cold+0x39/0x45 [igb] [140519.425544] Code: 
48 c7 c6 8c 11 87 c0 e8 48 00 a2 ea 48 8b bb 30 ff ff ff e8 15 84 4f ea 84 c0 
74 15 89 ee 48 c7 c7 68 1e 87 c0 e8 13 7e 9d ea <0f> 0b e9 5e 33 fe ff e9 73 33 
fe ff 48 63 c6 89 f2 48 c7 c6
00 1f
[140519.425551] RSP: 0018:ffffb519c0f57c68 EFLAGS: 00010286 [140519.425560] 
RAX: 0000000000000000 RBX: ffff8f5888874e90 RCX:
0000000000000027
[140519.425565] RDX: ffff8f5996d986e8 RSI: 0000000000000001 RDI:
ffff8f5996d986e0
[140519.425571] RBP: 000000000000c030 R08: 0000000000000000 R09:
ffffb519c0f57a98
[140519.425576] R10: ffffb519c0f57a90 R11: ffffffffac2cc4a8 R12:
00000000ffffffff
[140519.425581] R13: 0000000000000000 R14: ffff8f588b03a240 R15:
000000000000c030
[140519.425587] FS:  00007f9fff6f5740(0000) GS:ffff8f5996d80000(0000)
knlGS:0000000000000000
[140519.425594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[140519.425600] CR2: 00007f979d179f10 CR3: 0000000108bf0000 CR4:
00000000000406e0
[140519.425607] Call Trace:
[140519.425624]  igb_update_stats+0x71/0x810 [igb] [140519.425662]  
igb_get_stats64+0x2a/0x80 [igb] [140519.425697]  dev_get_stats+0x5c/0xc0 
[140519.425714]  dev_seq_printf_stats+0x3e/0xe0 [140519.425731]  
dev_seq_show+0x10/0x30 [140519.425741]  seq_read_iter+0x2d5/0x4c0 
[140519.425756]  seq_read+0x127/0x170 [140519.425770]  proc_reg_read+0x55/0xa0 
[140519.425781]  vfs_read+0xa7/0x1a0 [140519.425794]  ksys_read+0x67/0xe0 
[140519.425806]  do_syscall_64+0x33/0x40 [140519.425820]  
entry_SYSCALL_64_after_hwframe+0x44/0xae
[140519.425832] RIP: 0033:0x7fa0000b2862 [140519.425841] Code: c0 e9 b2 fe ff 
ff 50 48 8d 3d 5a 29 0a 00 e8 55
e4 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75
10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89
54 24
[140519.425847] RSP: 002b:00007fffd36627f8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[140519.425856] RAX: ffffffffffffffda RBX: 000055a2aea72b30 RCX:
00007fa0000b2862
[140519.425861] RDX: 0000000000000400 RSI: 000055a2aeaa3720 RDI:
0000000000000008
[140519.425866] RBP: 00007fa000185300 R08: 0000000000000008 R09:
0000000000000000
[140519.425871] R10: 0000000000001000 R11: 0000000000000246 R12:
000055a2aea72b30
[140519.425876] R13: 0000000000000d68 R14: 00007fa000184700 R15:
0000000000000d68
[140519.425888] ---[ end trace 73fc28661e6b9864 ]--- [140525.940737] 
------------[ cut here ]------------ [140525.940784] NETDEV WATCHDOG: lan1 
(igb): transmit queue 1 timed out [140525.940842] WARNING: CPU: 0 PID: 0 at 
net/sched/sch_generic.c:467
dev_watchdog+0x25e/0x270
[140525.940867] Modules linked in: rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm 
ib_core wireguard curve25519_x86_64 libchacha20poly1305
chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libcurve25519_generic 
libchacha libblake2s_generic ip6_udp_tunnel udp_tunnel amdgpu sch_cake 
gpu_sched act_mirred cls_u32 sch_ingress ifb sch_fq bridge ip6table_filter 
ip6_tables xt_nat xt_MASQUERADE iptable_nat nf_nat xt_state xt_conntrack 
nf_conntrack cfg80211
nf_defrag_ipv6 nf_defrag_ipv4 radeon libcrc32c xt_tcpudp iptable_filter rfkill 
8021q garp mrp stp llc edac_mce_amd kvm_amd snd_hda_codec_realtek 
snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm snd_hda_intel 
snd_intel_dspcfg snd_intel_sdw_acpi drm_ttm_helper snd_hda_codec ttm irqbypass 
crct10dif_pclmul drm_kms_helper snd_hda_core ghash_clmulni_intel snd_hwdep 
pcspkr k10temp fam15h_power snd_pcm cec snd_timer igb sp5100_tco ccp snd 
syscopyarea rng_core sysfillrect i2c_algo_bit sysimgblt i2c_piix4 dca 
fb_sys_fops soundcore mac_hid [140525.941306]  acpi_cpufreq nfsd auth_rpcgss 
nfs_acl lockd grace drm sunrpc fuse agpgart nfs_ssc bpf_preload ip_tables 
x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_mod crc32_pclmul 
crc32c_intel usbhid sdhci_pci aesni_intel cqhci sdhci xhci_pci crypto_simd 
mmc_core cryptd xhci_pci_renesas video
[140525.941471] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W
   5.12.13-arch1-2 #1
[140525.941482] Hardware name: CompuLab fitlet/fitlet, BIOS
SBCFLTR_0.08.01 06/23/2016
[140525.941489] RIP: 0010:dev_watchdog+0x25e/0x270 [140525.941504] Code: 67 40 
73 ff eb 94 4c 89 f7 c6 05 be 94 2e 01 01
e8 d7 2e fa ff 44 89 e9 4c 89 f6 48 c7 c7 38 d3 c2 ab 48 89 c2 e8 82
21 17 00 <0f> 0b e9 72 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00
[140525.941514] RSP: 0018:ffffb519c0003ea8 EFLAGS: 00010286 [140525.941538] 
RAX: 0000000000000000 RBX: ffff8f58833ae8c0 RCX:
0000000000000000
[140525.941546] RDX: ffff8f5996c28820 RSI: ffff8f5996c186e0 RDI:
0000000000000300
[140525.941553] RBP: ffff8f58888743dc R08: 0000000000000000 R09:
ffffb519c0003cd8
[140525.941561] R10: ffffb519c0003cd0 R11: ffffffffac2cc4a8 R12:
ffff8f5888874480
[140525.941568] R13: 0000000000000001 R14: ffff8f5888874000 R15:
ffff8f58833ae940
[140525.941577] FS:  0000000000000000(0000) GS:ffff8f5996c00000(0000)
knlGS:0000000000000000
[140525.941586] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[140525.941594] CR2: 00007fc64177a000 CR3: 0000000102aee000 CR4:
00000000000406f0
[140525.941603] Call Trace:
[140525.941613]  <IRQ>
[140525.941623]  ? pfifo_fast_reset+0x120/0x120 [140525.941637]  ? 
pfifo_fast_reset+0x120/0x120 [140525.941650]  call_timer_fn+0x29/0x130 
[140525.941666]  __run_timers+0x1ef/0x280 [140525.941683]  
run_timer_softirq+0x19/0x30 [140525.941695]  __do_softirq+0xd0/0x2c1 
[140525.941713]  irq_exit_rcu+0x9e/0xd0 [140525.941725]  
sysvec_apic_timer_interrupt+0x72/0x90
[140525.941739]  </IRQ>
[140525.941746]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[140525.941759] RIP: 0010:native_safe_halt+0xe/0x10 [140525.941772] Code: c0 7b 
01 00 f0 80 4a 02 20 48 8b 12 83 e2 08 75
c3 e9 7a ff ff ff cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 26 8a 56
00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 16 8a 56 00 f4 c3 cc cc 0f 1f
44 00
[140525.941781] RSP: 0018:ffffffffac203e28 EFLAGS: 00000246 [140525.941793] 
RAX: 0000000000004000 RBX: 0000000000000001 RCX:
000000000002d600
[140525.941807] RDX: ffff8f5996c00000 RSI: ffff8f5880c30000 RDI:
ffff8f5880c30064
[140525.941815] RBP: ffff8f5880c30064 R08: ffffffffac349f40 R09:
00007fcccb35f187
[140525.941822] R10: 00000000000004ec R11: 0000000000000091 R12:
0000000000000001
[140525.941830] R13: ffffffffac349fc0 R14: 0000000000000001 R15:
0000000000000000
[140525.941846]  acpi_idle_do_entry+0x46/0x50 [140525.941858]  
acpi_idle_enter+0x86/0xc0 [140525.941873]  cpuidle_enter_state+0x89/0x380 
[140525.941891]  cpuidle_enter+0x29/0x40 [140525.941906]  do_idle+0x1da/0x270 
[140525.941920]  cpu_startup_entry+0x19/0x20 [140525.941933]  
start_kernel+0x871/0x896 [140525.941950]  
secondary_startup_64_no_verify+0xc2/0xcb
[140525.941971] ---[ end trace 73fc28661e6b9865 ]--- [140525.942176] igb 
0000:01:00.0 lan1: Reset adapter [140526.684545] br1: port 2(lan1.9) entered 
disabled state


_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet


_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet

Reply via email to