Philip Cox - I think we have an RCA.
Below is the call stack of “iptables” at the moment of the hang (which is same
across all collected kernel dumps):
```
crash> bt 25894
PID: 25894 TASK: ffff89094bce8000 CPU: 1 COMMAND: "iptables"
#0 [ffffadb9456ab8f8] __schedule at ffffffffa5ba8b8d
#1 [ffffadb9456ab980] preempt_schedule_common at ffffffffa5ba92a8
#2 [ffffadb9456ab998] __cond_resched at ffffffffa5ba92e6
#3 [ffffadb9456ab9a8] down_read at ffffffffa5bab823
#4 [ffffadb9456ab9c0] kernfs_walk_and_get_ns at ffffffffa5248b16
#5 [ffffadb9456ab9f8] cgroup_get_from_path at ffffffffa4fa87fa
#6 [ffffadb9456aba20] cgroup_mt_check_v2 at ffffffffc07bf083 [xt_cgroup]
#7 [ffffadb9456aba48] xt_check_match at ffffffffc01304c1 [x_tables]
#8 [ffffadb9456abb08] find_check_entry at ffffffffc014315e [ip_tables]
#9 [ffffadb9456abbc8] translate_table at ffffffffc0144429 [ip_tables]
#10 [ffffadb9456abc68] do_ipt_set_ctl at ffffffffc014579c [ip_tables]
#11 [ffffadb9456abd10] nf_setsockopt at ffffffffa598d697
#12 [ffffadb9456abd50] ip_setsockopt at ffffffffa59a140a
#13 [ffffadb9456abd90] raw_setsockopt at ffffffffa59d44bf
#14 [ffffadb9456abd98] security_socket_setsockopt at ffffffffa533c5d2
#15 [ffffadb9456abdc8] __sys_setsockopt at ffffffffa58c1699
#16 [ffffadb9456abe10] __x64_sys_setsockopt at ffffffffa58c17c5
#17 [ffffadb9456abe20] x64_sys_call at ffffffffa4e06bab
#18 [ffffadb9456abe30] do_syscall_64 at ffffffffa5b9a9e4
#19 [ffffadb9456abe88] handle_mm_fault at ffffffffa51027d8
#20 [ffffadb9456abec8] do_user_addr_fault at ffffffffa4ea4b40
#21 [ffffadb9456abf00] irqentry_exit_to_user_mode at ffffffffa5b9f43e
#22 [ffffadb9456abf10] irqentry_exit at ffffffffa5b9f46d
#23 [ffffadb9456abf18] clear_bhb_loop at ffffffffa5c018c5
#24 [ffffadb9456abf28] clear_bhb_loop at ffffffffa5c018c5
#25 [ffffadb9456abf38] clear_bhb_loop at ffffffffa5c018c5
#26 [ffffadb9456abf50] entry_SYSCALL_64_after_hwframe at ffffffffa5c00124
RIP: 00007f715892496e RSP: 00007ffddb994cf8 RFLAGS: 00000206
RAX: ffffffffffffffda RBX: 00005589d9902dc8 RCX: 00007f715892496e
RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00005589d9909ec0 R8: 0000000000003348 R9: 0000000000000052
R10: 00005589d9909ec0 R11: 0000000000000206 R12: 00005589d99097d0
R13: 00005589d9902dc8 R14: 00005589d9902dc0 R15: 00005589d9909f20
ORIG_RAX: 0000000000000036 CS: 0033 SS: 002b
```
There are two cgroup-related functions on the stack, and the buggy one
is cgroup_get_from_path — it acquires the spinlock and then calls a
function which may cause the current process to sleep. This leaves the
spinlock locked triggering the subsequent hard lockup.
The good news is that the bug appears to be present briefly within 5.15
kernel — it was first introduced in 5.15.75 and “fixed” in 5.16.1
(https://github.com/torvalds/linux/commit/46307fd6e27a3f678a1678b02e667678c22aa8cc).
So two follow up questions for you at your convenience:
1. Does this RCA seem reasonable / correct to you?
2. If 1) can Canonical backport this fix to the 5.15 and 5.0.4-fips kernels?
3. If 1) In the mean time, is there a good way for me to find the version of
the aws Ubuntu kernel which would not contain this issue? In other words - how
can I translate 5.15.0-1072-aws to 5.15.xx so we can pin the kernel to the
previous revision - if not too far back?
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2089318
Title:
kernel hard lockup 5.15.0-1072-aws
Status in linux package in Ubuntu:
Triaged
Status in linux-aws-5.15 package in Ubuntu:
Triaged
Status in linux source package in Focal:
New
Status in linux-aws-5.15 source package in Focal:
New
Bug description:
Hi friends,
We hit a kernel hard lockup where all CPUs are stuck acquiring an
already-locked spinlock (css_set_lock) within the cgroup subsystem.
Below are the call stacks from a memory dump of a two-core system
taken on Ubuntu 20.04 (5.15 kernel) on AWS, but the same issue occurs
on Azure and GCP too. This is happening in a non-deterministic
fashion (less than 1%), and can occur at any time of the VM execution.
We suspect it’s a deadlock triggered by some race condition, but we
don’t know for sure.
```
PID: 21079 TASK: ffff91fdcd1dc000 CPU: 0 COMMAND: "sh"
#0 [fffffe7127850cb8] machine_kexec at ffffffffadc92680
#1 [fffffe7127850d18] __crash_kexec at ffffffffadda0b9f
#2 [fffffe7127850de0] panic at ffffffffae8f56be
#3 [fffffe7127850e70] unknown_nmi_error.cold at ffffffffae8eb4c8
#4 [fffffe7127850e90] default_do_nmi at ffffffffae99c639
#5 [fffffe7127850eb8] exc_nmi at ffffffffae99c7db
#6 [fffffe7127850ef0] end_repeat_nmi at ffffffffaea017f3
[exception RIP: native_queued_spin_lock_slowpath+63]
RIP: ffffffffadd40eff RSP: ffffa1f68589fc60 RFLAGS: 00000002 (interrupt
disabled!!)
RAX: 0000000000000001 RBX: ffffffffb0ea5804 RCX: ffff91fb597c8980
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffb0ea5804
RBP: ffffa1f68589fc88 R8: 0000000000005259 R9: 00000000597c8980
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa1f68589fdf8
R13: ffff91fdcd1d8000 R14: 0000000000004100 R15: ffff91fdcd1d8000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#7 [ffffa1f68589fc60] native_queued_spin_lock_slowpath at ffffffffadd40eff
#8 [ffffa1f68589fc90] _raw_spin_lock_irq at ffffffffae9af19a
#9 [ffffa1f68589fca0] cgroup_can_fork at ffffffffaddb0de8
#10 [ffffa1f68589fce8] copy_process at ffffffffadcc1938
#11 [ffffa1f68589fcf0] filemap_map_pages at ffffffffadeb68db
#12 [ffffa1f68589fdf0] __x64_sys_vfork at ffffffffadcc2a20
#13 [ffffa1f68589fe70] x64_sys_call at ffffffffadc068a9
#14 [ffffa1f68589fe80] do_syscall_64 at ffffffffae99a9e4
#15 [ffffa1f68589fec0] exit_to_user_mode_prepare at ffffffffadd725ad
#16 [ffffa1f68589ff00] irqentry_exit_to_user_mode at ffffffffae99f43e
#17 [ffffa1f68589ff10] irqentry_exit at ffffffffae99f46d
#18 [ffffa1f68589ff18] clear_bhb_loop at ffffffffaea018c5
#19 [ffffa1f68589ff28] clear_bhb_loop at ffffffffaea018c5
#20 [ffffa1f68589ff38] clear_bhb_loop at ffffffffaea018c5
#21 [ffffa1f68589ff50] entry_SYSCALL_64_after_hwframe at ffffffffaea00124
RIP: 00007fddfa4cebcc RSP: 00007fffaa741990 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 000055ea66750428 RCX: 00007fddfa4cebcc
RDX: 0000000000000000 RSI: 00007fffaa7419c0 RDI: 000055ea663c8866
RBP: 0000000000000003 R8: 00007fffaa7419c0 R9: 000055ea667505f0
R10: 0000000000000008 R11: 0000000000000202 R12: 00007fffaa7419c0
R13: 00007fffaa741ae0 R14: 0000000000000000 R15: 000055ea663de810
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
PID: 20304 TASK: ffff91fb05440000 CPU: 1 COMMAND: "Writer:Driver>C"
#0 [fffffe6c293d3e10] crash_nmi_callback at ffffffffadc81ec0
#1 [fffffe6c293d3e48] nmi_handle at ffffffffadc49b03
#2 [fffffe6c293d3e90] default_do_nmi at ffffffffae99c5a5
#3 [fffffe6c293d3eb8] exc_nmi at ffffffffae99c7db
#4 [fffffe6c293d3ef0] end_repeat_nmi at ffffffffaea017f3
[exception RIP: native_queued_spin_lock_slowpath+63]
RIP: ffffffffadd40eff RSP: ffffa1f6853afd00 RFLAGS: 00000002 (interrupt
disabled!!)
RAX: 0000000000000001 RBX: ffffffffb0ea5804 RCX: ffff91fa1d0aee00
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffb0ea5804
RBP: ffffa1f6853afd28 R8: 000000000000525a R9: 000000001d0aee00
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa1f6853afe98
R13: ffff91fd8eeea000 R14: 00000000003d0f00 R15: ffff91fd8eeea000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffa1f6853afd00] native_queued_spin_lock_slowpath at ffffffffadd40eff
#6 [ffffa1f6853afd30] _raw_spin_lock_irq at ffffffffae9af19a
#7 [ffffa1f6853afd40] cgroup_can_fork at ffffffffaddb0de8
#8 [ffffa1f6853afd88] copy_process at ffffffffadcc1938
#9 [ffffa1f6853afe20] kernel_clone at ffffffffadcc262d
#10 [ffffa1f6853afe90] __do_sys_clone at ffffffffadcc2a9d
#11 [ffffa1f6853aff10] __x64_sys_clone at ffffffffadcc2ae5
#12 [ffffa1f6853aff20] x64_sys_call at ffffffffadc05579
#13 [ffffa1f6853aff30] do_syscall_64 at ffffffffae99a9e4
#14 [ffffa1f6853aff50] entry_SYSCALL_64_after_hwframe at ffffffffaea00124
RIP: 00007f0d8bcac9f6 RSP: 00007f0cfabfcc38 RFLAGS: 00000206
RAX: ffffffffffffffda RBX: 00007f0cfabfcc90 RCX: 00007f0d8bcac9f6
RDX: 00007f0ced3ff910 RSI: 00007f0ced3feef0 RDI: 00000000003d0f00
RBP: ffffffffffffff80 R8: 00007f0ced3ff640 R9: 00007f0ced3ff640
R10: 00007f0ced3ff910 R11: 0000000000000206 R12: 00007f0ced3ff640
R13: 0000000000000016 R14: 00007f0d8bc1b7d0 R15: 00007f0cfabfcdf0
ORIG_RAX: 0000000000000038 CS: 0033 SS: 002b
```
Environment
```
$ uname -a
Linux ip-172-31-16-171 5.15.0-1072-aws #78~20.04.1-Ubuntu SMP Wed Oct 9
15:30:47 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 106
model name : Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
stepping : 6
microcode : 0xd0003e8
cpu MHz : 2900.036
cache size : 55296 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 27
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm
constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm
3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase
tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap
avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec
xgetbv1 xsaves wbnoinvd ida arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes
vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid md_clear
flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs
mmio_stale_data eibrs_pbrsb gds bhi
bogomips : 5800.07
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
```
We see this very infrequently, but have experienced it on a variety of
instanceTypes - r6i.large , r6i.xlarge, r6i.2large at least.
Thanks!
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089318/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp