MAAS deploy/release loop with focal[1] on d05-3 and has deployed for 82 times 
without failure.
MAAS deploy/release loop with bionic-hwe on appleton run 100 times and 10 of 
them are failed.

Look like this issue is only happened on appleton.

--
[1] For some reason I can not deploy bionic-hwe with d05-3. Working on it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1953058

Title:
  Kernel "BUG: soft lockup" with 5.4 kernels on arm64 node appleton node
  (dmesg spammed with "mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid
  1180): Completion event for bogus CQ 0x5a5aa9")

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  Confirmed

Bug description:
  The regression boot test running with bionic:linux-hwe-5.4
  5.4.0-92.103~18.04.2 failed because of the following hung task:

  Dec  2 12:17:12 appleton-kernel kernel: [   64.281447] watchdog: BUG: soft 
lockup - CPU#16 stuck for 22s! [swapper/16:0]
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288573] Modules linked in: 
ipmi_ssif nls_iso8859_1 joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler 
sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon 
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib hid_generic ses usbhid 
enclosure hid ib_uverbs ib_core marvell hibmc_drm drm_vram_helper ttm 
drm_kms_helper crct10dif_ce ghash_ce syscopyarea sysfillrect sha2_ce mlx5_core 
sysimgblt sha256_arm64 ixgbe hisi_sas_v2_hw fb_sys_fops nvme sha1_ce 
hisi_sas_main tls xfrm_algo drm megaraid_sas nvme_core mdio mlxfw libsas 
ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae 
aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288629] CPU: 16 PID: 0 Comm: 
swapper/16 Not tainted 5.4.0-91-generic #102~18.04.1-Ubuntu
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288630] Hardware name: 
Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288632] pstate: 40400005 (nZcv 
daif +PAN -UAO)
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288640] pc : 
__do_softirq+0x98/0x350
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288644] lr : irq_exit+0xc0/0xc8
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288645] sp : ffff800011ee3ef0
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288646] x29: ffff800011ee3ef0 
x28: ffff002fb71a2d00 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288649] x27: 0000000000000000 
x26: ffff800011ee4000 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288650] x25: ffff800011ee0000 
x24: ffff001fba073600 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288652] x23: ffff80001234bdb0 
x22: 0000000000000000 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288654] x21: 0000000000000282 
x20: 0000000000000002 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288656] x19: ffff8000116b3000 
x18: ffff800011267510 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288658] x17: 0000000000000000 
x16: 0000000000000000 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288659] x15: 0000000000000001 
x14: ffff002fbb9f21c8 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288661] x13: 0000000000000004 
x12: 0000000000000002 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288663] x11: 0000000000000000 
x10: 0000000000000040 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288665] x9 : ffff800011bbf228 
x8 : ffff800011bbf220 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288666] x7 : ffff001fb9002270 
x6 : 00000002c07fa07f 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288668] x5 : 00000000ffff00c1 
x4 : ffff802faa352000 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288670] x3 : ffff8000116b3780 
x2 : ffff802faa352000 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288672] x1 : 00000000000000e0 
x0 : ffff8000116b3780 
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288675] Call trace:
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288677]  
__do_softirq+0x98/0x350
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288679]  irq_exit+0xc0/0xc8
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288683]  
__handle_domain_irq+0x6c/0xc0
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288685]  
gic_handle_irq+0x84/0x2c0
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288687]  el1_irq+0x104/0x1c0
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288690]  
arch_cpu_idle+0x34/0x1c0
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288694]  
default_idle_call+0x24/0x60
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288696]  do_idle+0x1d8/0x2b8
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288699]  
cpu_startup_entry+0x28/0xb0
  Dec  2 12:17:12 appleton-kernel kernel: [   64.288702]  
secondary_start_kernel+0x198/0x288
  Dec  2 12:17:46 appleton-kernel kernel: [   98.829315] rcu: INFO: rcu_sched 
detected stalls on CPUs/tasks:
  Dec  2 12:17:46 appleton-kernel kernel: [   98.835229] rcu:   16-....: (17 
GPs behind) idle=3b2/0/0x1 softirq=877/877 fqs=7317 
  Dec  2 12:17:46 appleton-kernel kernel: [   98.842875]        (detected by 
35, t=15005 jiffies, g=3737, q=2560)
  Dec  2 12:17:46 appleton-kernel kernel: [   98.842877] Task dump for CPU 16:
  Dec  2 12:17:46 appleton-kernel kernel: [   98.842880] swapper/16      R  
running task        0     0      1 0x0000002a
  Dec  2 12:17:46 appleton-kernel kernel: [   98.842885] Call trace:
  Dec  2 12:17:46 appleton-kernel kernel: [   98.842897]  
__switch_to+0x108/0x248
  Dec  2 12:17:46 appleton-kernel kernel: [   98.842902]  0x0
  Dec  2 12:20:46 appleton-kernel kernel: [  278.848845] rcu: INFO: rcu_sched 
detected stalls on CPUs/tasks:
  Dec  2 12:20:46 appleton-kernel kernel: [  278.854762] rcu:   16-....: (17 
GPs behind) idle=3b2/0/0x1 softirq=877/877 fqs=29680 
  Dec  2 12:20:46 appleton-kernel kernel: [  278.862495]        (detected by 
48, t=60007 jiffies, g=3737, q=14867)
  Dec  2 12:20:46 appleton-kernel kernel: [  278.862498] Task dump for CPU 16:
  Dec  2 12:20:46 appleton-kernel kernel: [  278.862500] swapper/16      R  
running task        0     0      1 0x0000002a
  Dec  2 12:20:46 appleton-kernel kernel: [  278.862506] Call trace:
  Dec  2 12:20:46 appleton-kernel kernel: [  278.862520]  
__switch_to+0x108/0x248
  Dec  2 12:20:46 appleton-kernel kernel: [  278.862526]  0x0

  This is not so easily reproducible with the focal:linux kernel though,
  however at one time I was able to get this stack trace where it hung
  on some mlx5_core code:

  [  356.542512] watchdog: BUG: soft lockup - CPU#23 stuck for 23s! 
[swapper/23:0]
  [  356.549633] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua ipmi_ssif input_leds joydev ipmi_si ipmi_devintf 
ipmi_msghandler sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ses enclosure 
hid_generic usbhid hid ib_uverbs marvell ib_core hibmc_drm drm_vram_helper ttm 
drm_kms_helper crct10dif_ce syscopyarea sysfillrect ghash_ce sha2_ce sysimgblt 
mlx5_core fb_sys_fops hisi_sas_v2_hw sha256_arm64 ixgbe sha1_ce nvme 
hisi_sas_main tls xfrm_algo drm mdio libsas megaraid_sas nvme_core mlxfw 
ehci_platform hns_dsaf scsi_transport_sas hns_enet_drv hns_mdio hnae 
aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
  [  356.620860] CPU: 23 PID: 0 Comm: swapper/23 Not tainted 5.4.0-91-generic 
#102-Ubuntu
  [  356.628588] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018
  [  356.635188] pstate: 60400005 (nZCv daif +PAN -UAO)
  [  356.640007] pc : mlx5e_poll_rx_cq+0x30/0x8a0 [mlx5_core]
  [  356.645338] lr : mlx5e_napi_poll+0x100/0x660 [mlx5_core]
  [  356.650635] sp : ffff800011efbcc0
  [  356.653935] x29: ffff800011efbcc0 x28: 0000000000000001 
  [  356.659233] x27: ffff009fa0ed8000 x26: ffff009fa0ed8000 
  [  356.664531] x25: ffff009fa0eda480 x24: ffff009fa0ed7ec8 
  [  356.669829] x23: 0000000000000040 x22: 0000000000000000 
  [  356.675126] x21: ffff002fbbaeddc0 x20: 0000000000000040 
  [  356.680424] x19: ffff009fa0ed8100 x18: 0000000000000010 
  [  356.685721] x17: 0000000000000000 x16: 0000000000000000 
  [  356.691019] x15: ffff002fb71adf28 x14: 6220726f6620746e 
  [  356.696316] x13: ffffffffffffc138 x12: ffffffffffffe238 
  [  356.701614] x11: 0000000000003138 x10: 3935313a746e695f 
  [  356.706911] x9 : ffff800011db5000 x8 : 00000000000006d9 
  [  356.712209] x7 : 0000000000000017 x6 : ffff002f8d1c2200 
  [  356.717506] x5 : 000000000000000a x4 : 0000000000000006 
  [  356.722804] x3 : 0000000000000000 x2 : 0000000000000040 
  [  356.728101] x1 : 0000000000000040 x0 : ffff8000094f70a8 
  [  356.733400] Call trace:
  [  356.735867]  mlx5e_poll_rx_cq+0x30/0x8a0 [mlx5_core]
  [  356.740850]  mlx5e_napi_poll+0x100/0x660 [mlx5_core]
  [  356.745805]  net_rx_action+0x180/0x488
  [  356.749543]  __do_softirq+0x130/0x34c
  [  356.753194]  irq_exit+0xa4/0xc8
  [  356.756323]  __handle_domain_irq+0x74/0xc8
  [  356.760406]  gic_handle_irq+0x10c/0x2cc
  [  356.764228]  el1_irq+0x104/0x1c0
  [  356.767443]  arch_cpu_idle+0x3c/0x1c8
  [  356.771093]  default_idle_call+0x24/0x60
  [  356.775003]  do_idle+0x214/0x298
  [  356.778218]  cpu_startup_entry+0x30/0xb8
  [  356.782128]  secondary_start_kernel+0x16c/0x1e0
  [  390.270862] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
  [  390.276771] rcu:   23-....: (364 GPs behind) idle=966/0/0x3 
softirq=1461/1461 fqs=7130 
  [  390.284676]        (detected by 10, t=15005 jiffies, g=15605, q=284)
  [  390.290499] Call trace:
  [  390.292935]  __switch_to+0x134/0x190
  [  390.296498]  0xffff002fb71ada00
  [  392.554895] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: 
{ 23-... } 15478 jiffies s: 645 root: 0x2/.
  [  392.565495] rcu: blocking rcu_node structures: l=1:16-31:0x80/.
  [  392.571412] Call trace:
  [  392.573848]  __switch_to+0x134/0x190
  [  392.577413]  0xffff002fb71ada00
  [  424.543333] watchdog: BUG: soft lockup - CPU#23 stuck for 22s! 
[swapper/23:0]
  [  424.550454] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua ipmi_ssif input_leds joydev ipmi_si ipmi_devintf 
ipmi_msghandler sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ses enclosure 
hid_generic usbhid hid ib_uverbs marvell ib_core hibmc_drm drm_vram_helper ttm 
drm_kms_helper crct10dif_ce syscopyarea sysfillrect ghash_ce sha2_ce sysimgblt 
mlx5_core fb_sys_fops hisi_sas_v2_hw sha256_arm64 ixgbe sha1_ce nvme 
hisi_sas_main tls xfrm_algo drm mdio libsas megaraid_sas nvme_core mlxfw 
ehci_platform hns_dsaf scsi_transport_sas hns_enet_drv hns_mdio hnae 
aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
  [  424.621659] CPU: 23 PID: 0 Comm: swapper/23 Tainted: G             L    
5.4.0-91-generic #102-Ubuntu
  [  424.630775] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018
  [  424.637374] pstate: 20400005 (nzCv daif +PAN -UAO)
  [  424.642152] pc : tasklet_action_common.isra.0+0x5c/0x1a0
  [  424.647450] lr : tasklet_action+0x30/0x38
  [  424.651444] sp : ffff800011efbe50
  [  424.654745] x29: ffff800011efbe50 x28: ffff002fbbad8fc0 
  [  424.660043] x27: 0000000000000003 x26: ffff002fb71ada00 
  [  424.665340] x25: 0000000000000100 x24: 0000000000000006 
  [  424.670638] x23: 0000000000000060 x22: 00000000000000e0 
  [  424.675935] x21: 0000000000000004 x20: ffff002fa94150a0 
  [  424.681233] x19: ffff800011b750f0 x18: 0000000000000010 
  [  424.686530] x17: 0000000000000000 x16: 0000000000000000 
  [  424.691828] x15: ffff002fb71adf28 x14: 6220726f6620746e 
  [  424.697125] x13: ffffffffffffc138 x12: ffffffffffffe238 
  [  424.702423] x11: 0000000000003138 x10: 3935313a746e695f 
  [  424.707720] x9 : 0000000000003c80 x8 : 00000000000006d9 
  [  424.713018] x7 : 0000000000000000 x6 : 00000006e8cc2da3 
  [  424.718315] x5 : 00ffffffffffffff x4 : ffff002fad862680 
  [  424.723613] x3 : 00000000000004af x2 : ffff802faa45b000 
  [  424.728910] x1 : 0000000000000000 x0 : ffff002fb71adf28 
  [  424.734208] Call trace:
  [  424.736642]  tasklet_action_common.isra.0+0x5c/0x1a0
  [  424.741593]  tasklet_action+0x30/0x38
  [  424.745241]  __do_softirq+0x130/0x34c
  [  424.748890]  irq_exit+0xa4/0xc8
  [  424.752018]  __handle_domain_irq+0x74/0xc8
  [  424.756100]  gic_handle_irq+0x10c/0x2cc
  [  424.759922]  el1_irq+0x104/0x1c0
  [  424.763137]  arch_cpu_idle+0x3c/0x1c8
  [  424.766785]  default_idle_call+0x24/0x60
  [  424.770694]  do_idle+0x214/0x298
  [  424.773909]  cpu_startup_entry+0x30/0xb8
  [  424.777818]  secondary_start_kernel+0x16c/0x1e0
  [  484.716312] INFO: task shutdown:1 blocked for more than 120 seconds.
  [  484.722658]       Tainted: G             L    5.4.0-91-generic #102-Ubuntu
  [  484.729521] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  484.737341] Call trace:
  [  484.739776]  __switch_to+0x134/0x190
  [  484.743341]  __schedule+0x31c/0x7e0
  [  484.746818]  schedule+0x40/0xb8
  [  484.749950]  synchronize_rcu_expedited+0x1e0/0x3a8
  [  484.754730]  synchronize_net+0x24/0x38
  [  484.758469]  netif_napi_del+0xac/0xb0
  [  484.762155]  mlx5e_close_channel+0x38/0x58 [mlx5_core]
  [  484.767315]  mlx5e_close_channels+0x38/0x60 [mlx5_core]
  [  484.772562]  mlx5e_close_locked+0x70/0x90 [mlx5_core]
  [  484.777634]  mlx5e_close+0x58/0x80 [mlx5_core]
  [  484.782101]  mlx5e_nic_disable+0x9c/0xc0 [mlx5_core]
  [  484.787087]  mlx5e_detach_netdev+0x54/0x90 [mlx5_core]
  [  484.792246]  mlx5e_detach+0x60/0x78 [mlx5_core]
  [  484.796796]  mlx5_detach_device+0xb4/0x138 [mlx5_core]
  [  484.801954]  mlx5_unload_one+0xc4/0x170 [mlx5_core]
  [  484.806852]  shutdown+0x1c4/0x1e0 [mlx5_core]
  [  484.811202]  pci_device_shutdown+0x48/0x88
  [  484.815288]  device_shutdown+0x14c/0x298
  [  484.819201]  kernel_power_off+0x40/0x78
  [  484.823026]  __do_sys_reboot+0x158/0x228
  [  484.826937]  __arm64_sys_reboot+0x30/0x40
  [  484.830936]  el0_svc_common.constprop.0+0xf4/0x200
  [  484.835715]  el0_svc_handler+0x38/0xa8
  [  484.839454]  el0_svc+0x10/0x2c8

  Not sure if the issue with mlx5 is the symptom or the root cause.

  This is not a regression as I was able to reproduce it with hwe
  version 5.4.0-89-generic. I have not tried older kernels but it's
  possible that this issue is present for a long time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1953058/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to