I can reproduce the bug on 5.4.0-40-generic

** Also affects: linux (Ubuntu)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-oem-5.6 in Ubuntu.
https://bugs.launchpad.net/bugs/1887723

Title:
  mlx5_core: Error cqe on cqn

Status in linux package in Ubuntu:
  Incomplete
Status in linux-oem-5.6 package in Ubuntu:
  New

Bug description:
  I have encountered the following repeating error with kernel
  5.6.0-1018-oem. Network was disturbed and error kept repeating until
  for one hour until the system was hung.

  316294.820469] mlx5_core 0000:44:00.1 enp68s0f1: Error cqe on cqn 0x816, ci 
0xc5, sqn 0x1908, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
  [316294.833103] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [316294.833106] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [316294.833110] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [316294.833116] 00000030: 00 00 00 00 04 00 51 04 0e 00 19 08 53 64 dc d2
  [316294.833118] WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x364, len: 
128
  [316294.833120] 00000000: 00 53 64 0e 00 19 08 07 00 00 00 08 00 00 00 00
  [316294.833121] 00000010: 00 00 00 00 c0 00 05 a0 00 00 00 00 00 42 00 a3
  [316294.833123] 00000020: 8e bf 47 d7 86 14 ad f8 ef 46 08 00 45 00 12 34
  [316294.833124] 00000030: 76 d8 40 00 40 06 77 97 c3 a8 4a 4a 5f 67 cc fa
  [316294.833126] 00000040: 01 bb d8 2a 5c 7e 3d a0 b0 c5 3e 74 80 18 00 0b
  [316294.833127] 00000050: 4c 7b 00 00 01 01 08 0a 63 59 a1 46 00 41 05 b4
  [316294.833129] 00000060: 00 00 12 00 00 08 01 01 00 00 00 00 c2 c6 0b 74
  [316294.833130] 00000070: 00 00 00 44 00 08 01 01 00 00 00 00 c3 09 6c fc
  [316294.833144] mlx5_core 0000:44:00.1 enp68s0f1: ERR CQE on SQ: 0x1908
  [316294.996328] enp68s0f1: hw csum failure 
  [316295.000262] skb len=1500 headroom=78 headlen=1500 tailroom=22
  [316295.000262] mac=(64,14) net=(78,40) trans=118
  [316295.000262] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
  [316295.000262] csum(0x81a5 ip_summed=2 complete_sw=0 valid=0 level=0)
  [316295.000262] hash(0x322a7dd7 sw=0 l4=1) proto=0x86dd pkttype=0 iif=0
  [316295.029909] dev name=enp68s0f1 feat=0x0x0010a1821fd14ba9 
  ...
  [316295.943994] Hardware name: ASUSTeK COMPUTER INC. 
RS500A-E10-RS12U/KRPA-U16 Series, BIOS 0703 03/06/2020
  [316295.943995] Call Trace:
  [316295.943997]  <IRQ>
  [316295.944002]  dump_stack+0x6d/0x9a
  [316295.944006]  netdev_rx_csum_fault.part.0+0x41/0x45
  [316295.944007]  __skb_gro_checksum_complete.cold+0xb/0x10
  [316295.944009]  tcp6_gro_receive+0xdc/0x1c0
  [316295.944010]  ipv6_gro_receive+0x1dc/0x460
  [316295.944012]  ? kmem_cache_alloc+0x16d/0x230
  [316295.944017]  dev_gro_receive+0x2fb/0x690
  [316295.996284]  ? mlx5e_build_rx_skb+0x38c/0xb60 [mlx5_core]
  [316296.010778]  napi_gro_receive+0x39/0x140
  [316296.010793]  mlx5e_handle_rx_cqe+0xa5/0x150 [mlx5_core]
  [316296.010808]  mlx5e_poll_rx_cq+0x7fe/0x910 [mlx5_core]
  [316296.010825]  mlx5e_napi_poll+0xda/0x610 [mlx5_core]
  [316296.010843]  ? mlx5_eq_comp_int+0x149/0x1b0 [mlx5_core]
  [316296.010850]  net_rx_action+0x13a/0x370
  [316296.010859]  __do_softirq+0xe1/0x2d6
  [316296.010862]  irq_exit+0xae/0xb0
  [316296.010863]  do_IRQ+0x5a/0xf0
  [316296.010865]  common_interrupt+0xf/0xf
  [316296.010866]  </IRQ>
  [316296.010868] RIP: 0010:cpuidle_enter_state+0xca/0x3e0
  [316296.010869] Code: ff e8 aa 7d 7e ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 
00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 2d 01 85 ff fb 66 0f 1f 44 00 00 <45> 85 
e4 0f 88 3f 02 00 00 49 63 d4 4c 8b 7d d0 4c 2b 7d c8 48 8d
  [316296.010870] RSP: 0018:ffff9d84002cfe38 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffffda
  [316296.010872] RAX: ffff91110b62ce00 RBX: ffff9110ac1d1c00 RCX: 
000000000000001f
  [316296.010872] RDX: 0000000000000000 RSI: 00000000334bfb91 RDI: 
0000000000000000
  [316296.010873] RBP: ffff9d84002cfe78 R08: 00011fab2ae67109 R09: 
00011faebfd6b300
  [316296.010873] R10: ffff91110b62bac4 R11: ffff91110b62baa4 R12: 
0000000000000002
  [316296.010874] R13: ffffffff8f978700 R14: 0000000000000002 R15: 
ffff9110ac1d1c00
  [316296.010876]  ? cpuidle_enter_state+0xa6/0x3e0
  [316296.010878]  cpuidle_enter+0x2e/0x40
  [316296.010880]  call_cpuidle+0x23/0x40
  [316296.010881]  do_idle+0x1e7/0x280
  [316296.010882]  cpu_startup_entry+0x20/0x30
  [316296.010885]  start_secondary+0x167/0x1c0
  [316296.010886]  secondary_startup_64+0xa4/0xb0

  # lspci -v -s 0000:44:00.1
  44:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 
Lx]
        Subsystem: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
        Flags: bus master, fast devsel, latency 0, IRQ 254, NUMA node 0
        Memory at b0000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at b5300000 [disabled] [size=1M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [230] Access Control Services
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1887723/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to