I often see the following hung task errors when boot fails: [ 250.357032] INFO: task irq/95-mpam:msc:919 blocked for more than 122 seconds. [ 250.364337] Tainted: G W 6.14.0-1012-nvidia #12-Ubuntu [ 250.371552] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 373.242183] INFO: task irq/95-mpam:msc:919 blocked for more than 245 seconds. [ 373.249487] Tainted: G W 6.14.0-1012-nvidia #12-Ubuntu [ 373.256700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 496.123076] INFO: task irq/95-mpam:msc:919 blocked for more than 368 seconds. [ 496.130382] Tainted: G W 6.14.0-1012-nvidia #12-Ubuntu [ 496.137595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 619.009231] INFO: task irq/95-mpam:msc:919 blocked for more than 491 seconds. [ 619.016536] Tainted: G W 6.14.0-1012-nvidia #12-Ubuntu [ 619.023750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 741.893003] INFO: task irq/95-mpam:msc:919 blocked for more than 614 seconds. [ 741.900305] Tainted: G W 6.14.0-1012-nvidia #12-Ubuntu [ 741.907519] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.14 in Ubuntu. https://bugs.launchpad.net/bugs/2128704 Title: MPAM patchset leads to various invalid pointer accesses during kernel memory allocation Status in linux-nvidia-6.14 package in Ubuntu: Invalid Status in linux-nvidia-6.14 source package in Noble: New Bug description: Boot testing of linux-nvidia-6.14 version 6.14.0-1012.12 on Grace hardware revealed frequent but inconsistent invalid memory accesses during boot, which often causes the system to hang. A number of different errors have been observed and are shown below. Many traces seem to occur during memory allocation and freeing. Reverting the recent set of patches for MPAM (listed in a follow-up reply) resolves the issue, suggesting the problem lies in this patchset. [ 15.567478] Unable to handle kernel paging request at virtual address cad7e423cc6acbad [ 15.584245] Mem abort info: [ 15.587100] ESR = 0x0000000096000004 [ 15.590940] EC = 0x25: DABT (current EL), IL = 32 bits [ 15.596376] SET = 0, FnV = 0 [ 15.599499] EA = 0, S1PTW = 0 [ 15.602710] FSC = 0x04: level 0 translation fault [ 15.607701] Data abort info: [ 15.610644] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 15.616261] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 15.621429] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 15.626865] [cad7e423cc6acbad] address between user and kernel address ranges [ 15.634173] Internal error: Oops: 0000000096000004 [#1] SMP [ 15.639873] Modules linked in: sha3_ce i2c_smbus nvme sha2_ce ixgbe(+) nvme_core sha256_arm64 sha1_ce xfrm_algo xhci_pci_renesas nvme_auth mdio i2c_tegra aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [ 15.658385] CPU: 72 UID: 0 PID: 1122 Comm: kworker/72:2 Not tainted 6.14.0-1012-nvidia-64k #12-Ubuntu [ 15.667814] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023 [ 15.674933] Workqueue: events work_for_cpu_fn [ 15.679398] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 15.686515] pc : __kmalloc_node_track_caller_noprof+0xf4/0x5a0 [ 15.692496] lr : __kmalloc_node_track_caller_noprof+0x21c/0x5a0 [ 15.698547] sp : ffff8000bbd0f9e0 [ 15.701930] x29: ffff8000bbd0fa00 x28: 0000000000000000 x27: 00000000ff7a0000 [ 15.709226] x26: 00000000ffffffff x25: 551c7535cc7a3468 x24: cad7e423cc6acba5 [ 15.716522] x23: 0000000000000cc0 x22: 0000000000000009 x21: ffff9ff8609ef954 [ 15.723817] x20: 00000000ffffffff x19: ffff000080016b00 x18: ffff8000bbd20060 [ 15.731114] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 15.738409] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 15.745704] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff9ff8603841b4 [ 15.752999] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff1000455a1480 [ 15.760295] x5 : 0000000000000000 x4 : ffff9ff8609ef954 x3 : 0000000000007648 [ 15.767591] x2 : adcb6acc23e4d7ca x1 : cad7e423cc6acba5 x0 : 0000000000000008 [ 15.774888] Call trace: [ 15.777385] __kmalloc_node_track_caller_noprof+0xf4/0x5a0 (P) [ 15.783351] kvasprintf+0x90/0x150 [ 15.786835] pci_request_irq+0x9c/0x160 [ 15.790760] queue_request_irq+0xa4/0xe0 [nvme] [ 15.795400] nvme_create_queue+0x2b0/0x338 [nvme] [ 15.800207] nvme_setup_io_queues+0x358/0x560 [nvme] [ 15.805280] nvme_probe+0x2a4/0x3f8 [nvme] [ 15.809466] local_pci_probe+0x4c/0xe0 [ 15.813295] work_for_cpu_fn+0x28/0x58 [ 15.817123] process_one_work+0x178/0x430 [ 15.821219] worker_thread+0x30c/0x420 [ 15.825048] kthread+0x100/0x120 [ 15.828344] ret_from_fork+0x10/0x20 [ 15.832000] Code: aa1803e1 f9405e79 8b000302 dac00c42 (f8606b04) [ 15.838231] ---[ end trace 0000000000000000 ]--- ========================================== [ 19.236013] ------------[ cut here ]------------ les-load.service - Load Kern[ 19.242379] kernel BUG at mm/slub.c:545! el Modules. [ 19.249231] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 19.256704] Modules linked in: dm_multipath efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 linear hid_generic rndis_host usbhid cdc_ether hid usbnet uas usb_storage polyval_ce polyval_generic ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 nvme sha3_ce i2c_smbus sha2_ce ixgbe nvme_core sha256_arm64 sha1_ce xfrm_algo xhci_pci_renesas nvme_auth mdio i2c_tegra aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [ 19.307781] CPU: 138 UID: 0 PID: 1093 Comm: kworker/138:1 Not tainted 6.14.0-1012-nvidia-64k #12-Ubuntu [ 19.317390] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023 [ OK [ 0 m1]9 .F3i2n4i07] Workqueue: events key_garbage_collector shed systemd-remount-fs.servic…mount Root and Kernel File Systems. [ 19.339005] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 19.346123] pc : __slab_free+0xf8/0x2f0 [ 19.350053] lr : __slab_free+0x54/0x2f0 [ 19.353970] sp : ffff8000bbb4fb90 [ 19.357354] x29: ffff8000bbb4fb90 x28: ffff000080010500 x27: 0000000000000000 [ 19.364650] x26: 00000000ffffffff x25: ffffffe3c0110f40 x24: ffff1000443d2470 [ 19.371946] x23: ffff1000443d2470 x22: ffffc8163a2316b4 x21: 0000000000000001 [ 19.379242] x20: ffff1000443d2470 x19: ffffc8163a2316b4 x18: ffff8000bbb60038 [ 19.386539] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 19.386541] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 19.386542] x11: [0[000;03020000000000 x10: 811cbf172858d679 x9 : ffffc81639f5d600 OK ] Finished systemd-udev-trigger.service - Coldplug All udev Devices. [ 19.417405] x8 : ffff8000bbb4fc50 x7 : 0000000000000000 x6 : ffff1000443d2478 [ 19.424701] x5 : ffffc8163a2316b4 x4 : 0000000000000000 x3 : 0000000090000f72 [ 19.431997] x2 : ffffffffffffffc0 x1 : 0000000000000000 x0 : 0000000000000008 [ 19.439293] Call trace: [ 19.441792] __slab_free+0xf8/0x2f0 (P) [ 19.445713] kfree+0x2b0/0x378 [ 19.448830] key_gc_unused_keys.constprop.0+0xf4/0x1b0 [ 19.454081] key_garbage_collector+0x1c0/0x4e0 [ 19.458621] process_one_work+0x178/0x430 [ 19.462722] worker_thread+0x30c/0x420 [ 19.466551] kthread+0x100/0x120 [ 19.469847] ret_from_fork+0x10/0x20 [ 19.469856] Code: b9402b80 8b0002e6 eb17031f 54fffc01 (d4210000) [ 19.479734] ---[ end trace 0000000000000000 ]--- [ 19.512155] note: kworker/138:1[1093] exited with irqs disabled [ 19.518239] note: kworker/138:1[1093] exited with preempt_count 1 [ 19.524533] ------------[ cut here ]------------ [ 19.529251] WARNING: CPU: 138 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.isra.0+0x100/0x108 [ 19.538870] Modules linked in: dm_multipath efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 linear hid_generic rndis_host usbhid cdc_ether hid usbnet uas usb_storage polyval_ce polyval_generic ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 nvme sha3_ce i2c_smbus sha2_ce ixgbe nvme_core sha256_arm64 sha1_ce xfrm_algo xhci_pci_renesas nvme_auth mdio i2c_tegra aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [ 19.589931] CPU: 138 UID: 0 PID: 0 Comm: swapper/138 Tainted: G D 6.14.0-1012-nvidia-64k #12-Ubuntu [ 19.600604] Tainted: [D]=DIE [ 19.603544] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023 [ 19.610661] pstate: 234003c9 (nzCv DAIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 19.617779] pc : ct_kernel_exit.isra.0+0x100/0x108 [ 19.622675] lr : ct_idle_enter+0x18/0x38 [ 19.626681] sp : ffff800093b2fd20 [ 19.630064] x29: ffff800093b2fd20 x28: 0000000000000000 x27: 0000000000000000 [ 19.637360] x26: 0000000000000000 x25: 000000048bc09940 x24: 0000000000000000 [ 19.644655] x23: 0000000000000000 x22: ffffc8163d5c4910 x21: 0000000000000000 [ 19.651951] x20: 0000000000000005 x19: ffff103bed223108 x18: ffff800093b40040 [ 19.659247] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 19.666542] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 19.673838] x11: 0000000000000000 x10: 4bf4a636a91ac898 x9 : ffffc8163b235ac4 [ 19.681133] x8 : ffff1000073bbe38 x7 : 0000000000000000 x6 : 0000000000000000 [ 19.688428] x5 : 4000000000000002 x4 : ffff4825b0a20000 x3 : ffff800093b2fd20 [ 19.695724] x2 : ffffc8163c803108 x1 : ffffc8163c803108 x0 : 4000000000000000 [ 19.703020] Call trace: [ 19.705517] ct_kernel_exit.isra.0+0x100/0x108 (P) [ 19.710412] ct_idle_enter+0x18/0x38 [ 19.714063] cpuidle_enter_state+0x2fc/0x720 [ 19.718425] cpuidle_enter+0x44/0x78 [ 19.722081] cpuidle_idle_call+0x15c/0x238 [ 19.726267] do_idle+0x100/0x110 [ 19.729562] cpu_startup_entry+0x40/0x50 [ 19.733568] secondary_start_kernel+0xe4/0x128 [ 19.738108] __secondary_switched+0xc8/0xd0 [ 19.742386] ---[ end trace 0000000000000000 ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.14/+bug/2128704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

