Public bug reported:
[Impact]
On system with cores > 200, printk during HMAT parsing can create soft
lockup:
[ 35.769351] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/0:1]
[ 35.769354] Modules linked in:
[ 35.769358] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.8.0-1009-nvidia-64k
#9~22.04.1-Ubuntu
[ 35.769360] Hardware name: NVIDIA PG548-C00/P4975, BIOS 02.02.02 20240729
[ 35.769362] pstate: 81001009 (Nzcv daif -PAN -UAO -TCO +DIT +SSBS BTYPE=--)
[ 35.769364] pc : console_flush_all+0x1f0/0x3f0
[ 35.769375] lr : console_flush_all+0x1e8/0x3f0
[ 35.769376] sp : ffff800081b2f870
[ 35.769377] x29: ffff800081b2f870 x28: ffffc06104152a10 x27: ffffc06103390008
[ 35.769378] x26: 0000000000000001 x25: ffffc06103da8a98 x24: 0000000000000000
[ 35.769380] x23: 0000000000000000 x22: ffff800081b2f990 x21: ffff800081b2f98f
[ 35.769381] x20: ffffc06104307160 x19: 0000000000000001 x18: 0000000000000000
[ 35.769382] x17: 0000000000000000 x16: 0000000000000000 x15: 6977646e61422073
[ 35.769384] x14: 0000000000000000 x13: 732f424d20303a5d x12: 39322d39325b7465
[ 35.769385] x11: 677261542d726f74 x10: 616974696e492020 x9 : ffffc06100bf35c8
[ 35.769386] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[ 35.769387] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 35.769389] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 35.769391] Call trace:
[ 35.769393] console_flush_all+0x1f0/0x3f0
[ 35.769395] console_unlock+0x70/0x1f8
[ 35.769396] vprintk_emit+0x2e4/0x4b8
[ 35.769398] vprintk_default+0x40/0x80
[ 35.769399] vprintk+0x98/0x150
[ 35.769400] _printk+0x64/0xc0
[ 35.769405] hmat_parse_locality.constprop.0+0x1f4/0x740
[ 35.769416] hmat_parse_subtable+0x58/0xf0
[ 35.769418] acpi_parse_entries_array+0x1fc/0x360
[ 35.769426] acpi_table_parse_entries_array+0xa4/0x170
[ 35.769431] acpi_table_parse_entries+0x4c/0xa0
[ 35.769433] hmat_init+0x14c/0x440
[ 35.769434] do_one_initcall+0x4c/0x368
[ 35.769439] do_initcalls+0x134/0x2c0
[ 35.769442] kernel_init_freeable+0x128/0x2b0
[ 35.769443] kernel_init+0x38/0x240
[ 35.769451] ret_from_fork+0x10/0x20
[ 35.959318] acpi/hmat: Initiator-Target[29-30]:0 MB/s
[ 35.964658] acpi/hmat: Initiator-Target[29-31]:0 MB/s
[ 35.969997] acpi/hmat: Initiator-Target[29-32]:0 MB/s
[ 35.975335] acpi/hmat: Initiator-Target[29-33]:0 MB/s
[Fix]
The information isn't that useful, lower the message level to pr_debug()
to avoid the issue.
[Test]
Boot up the system and check dmesg.
With the patch applied, no more HMAT spam can be found in kernel
message.
[Where problems could occur]
There can be some obscure race conditions be exposed by booting the
kernel faster.
** Affects: linux (Ubuntu)
Importance: Undecided
Status: Confirmed
** Changed in: linux (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2090982
Title:
Prevent soft lockup on boot up
Status in linux package in Ubuntu:
Confirmed
Bug description:
[Impact]
On system with cores > 200, printk during HMAT parsing can create soft
lockup:
[ 35.769351] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/0:1]
[ 35.769354] Modules linked in:
[ 35.769358] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
6.8.0-1009-nvidia-64k #9~22.04.1-Ubuntu
[ 35.769360] Hardware name: NVIDIA PG548-C00/P4975, BIOS 02.02.02 20240729
[ 35.769362] pstate: 81001009 (Nzcv daif -PAN -UAO -TCO +DIT +SSBS BTYPE=--)
[ 35.769364] pc : console_flush_all+0x1f0/0x3f0
[ 35.769375] lr : console_flush_all+0x1e8/0x3f0
[ 35.769376] sp : ffff800081b2f870
[ 35.769377] x29: ffff800081b2f870 x28: ffffc06104152a10 x27:
ffffc06103390008
[ 35.769378] x26: 0000000000000001 x25: ffffc06103da8a98 x24:
0000000000000000
[ 35.769380] x23: 0000000000000000 x22: ffff800081b2f990 x21:
ffff800081b2f98f
[ 35.769381] x20: ffffc06104307160 x19: 0000000000000001 x18:
0000000000000000
[ 35.769382] x17: 0000000000000000 x16: 0000000000000000 x15:
6977646e61422073
[ 35.769384] x14: 0000000000000000 x13: 732f424d20303a5d x12:
39322d39325b7465
[ 35.769385] x11: 677261542d726f74 x10: 616974696e492020 x9 :
ffffc06100bf35c8
[ 35.769386] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
0000000000000000
[ 35.769387] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
0000000000000000
[ 35.769389] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
0000000000000000
[ 35.769391] Call trace:
[ 35.769393] console_flush_all+0x1f0/0x3f0
[ 35.769395] console_unlock+0x70/0x1f8
[ 35.769396] vprintk_emit+0x2e4/0x4b8
[ 35.769398] vprintk_default+0x40/0x80
[ 35.769399] vprintk+0x98/0x150
[ 35.769400] _printk+0x64/0xc0
[ 35.769405] hmat_parse_locality.constprop.0+0x1f4/0x740
[ 35.769416] hmat_parse_subtable+0x58/0xf0
[ 35.769418] acpi_parse_entries_array+0x1fc/0x360
[ 35.769426] acpi_table_parse_entries_array+0xa4/0x170
[ 35.769431] acpi_table_parse_entries+0x4c/0xa0
[ 35.769433] hmat_init+0x14c/0x440
[ 35.769434] do_one_initcall+0x4c/0x368
[ 35.769439] do_initcalls+0x134/0x2c0
[ 35.769442] kernel_init_freeable+0x128/0x2b0
[ 35.769443] kernel_init+0x38/0x240
[ 35.769451] ret_from_fork+0x10/0x20
[ 35.959318] acpi/hmat: Initiator-Target[29-30]:0 MB/s
[ 35.964658] acpi/hmat: Initiator-Target[29-31]:0 MB/s
[ 35.969997] acpi/hmat: Initiator-Target[29-32]:0 MB/s
[ 35.975335] acpi/hmat: Initiator-Target[29-33]:0 MB/s
[Fix]
The information isn't that useful, lower the message level to pr_debug()
to avoid the issue.
[Test]
Boot up the system and check dmesg.
With the patch applied, no more HMAT spam can be found in kernel
message.
[Where problems could occur]
There can be some obscure race conditions be exposed by booting the
kernel faster.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2090982/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp