Excerpts from John Paul Adrian Glaubitz's message of October 26, 2021 6:48 pm: > Hi Michael! > >> The Linux kernel for powerpc since v5.2 has a bug which allows a >> malicious KVM guest to crash the host, when the host is running on >> Power8. >> >> Only machines using Linux as the hypervisor, aka. KVM, powernv or bare >> metal, are affected by the bug. Machines running PowerVM are not >> affected. >> >> The bug was introduced in: >> >> 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") >> >> Which was first released in v5.2. >> >> The upstream fix is: >> >> cdeb5d7d890e ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 >> if it went to guest") >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337 >> >> Which will be included in the v5.16 release. > > I have tested these patches against 5.14 but it seems the problem [1] still > remains for me > for big-endian guests. I built a patched kernel yesterday, rebooted the KVM > server and let > the build daemons do their work over night. > > When I got up this morning, I noticed the machine was down, so I checked the > serial console > via IPMI and saw the same messages again as reported in [1]: > > [41483.963562] watchdog: BUG: soft lockup - CPU#104 stuck for 25521s! > [migration/104:175] > [41507.963307] watchdog: BUG: soft lockup - CPU#104 stuck for 25544s! > [migration/104:175] > [41518.311200] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [41518.311216] rcu: 136-...0: (135 ticks this GP) > idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2729959 > [41547.962882] watchdog: BUG: soft lockup - CPU#104 stuck for 25581s! > [migration/104:175] > [41571.962627] watchdog: BUG: soft lockup - CPU#104 stuck for 25603s! > [migration/104:175] > [41581.330530] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [41581.330546] rcu: 136-...0: (135 ticks this GP) > idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2736378 > [41611.962202] watchdog: BUG: soft lockup - CPU#104 stuck for 25641s! > [migration/104:175] > [41635.961947] watchdog: BUG: soft lockup - CPU#104 stuck for 25663s! > [migration/104:175] > [41644.349859] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [41644.349876] rcu: 136-...0: (135 ticks this GP) > idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2742753 > [41671.961564] watchdog: BUG: soft lockup - CPU#104 stuck for 25697s! > [migration/104:175] > [41695.961309] watchdog: BUG: soft lockup - CPU#104 stuck for 25719s! > [migration/104:175] > [41707.369190] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [41707.369206] rcu: 136-...0: (135 ticks this GP) > idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2749151 > [41735.960884] watchdog: BUG: soft lockup - CPU#104 stuck for 25756s! > [migration/104:175] > [41759.960629] watchdog: BUG: soft lockup - CPU#104 stuck for 25778s! > [migration/104:175] > [41770.388520] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [41770.388548] rcu: 136-...0: (135 ticks this GP) > idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2755540 > [41776.076307] rcu: rcu_sched kthread timer wakeup didn't happen for 1423 > jiffies! g49897 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 > [41776.076327] rcu: Possible timer handling issue on cpu=32 > timer-softirq=1056014 > [41776.076336] rcu: rcu_sched kthread starved for 1424 jiffies! g49897 f0x0 > RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=32 > [41776.076350] rcu: Unless rcu_sched kthread gets sufficient CPU time, > OOM is now expected behavior. > [41776.076360] rcu: RCU grace-period kthread stack dump: > [41776.076434] rcu: Stack dump where RCU GP kthread last ran: > [41783.960374] watchdog: BUG: soft lockup - CPU#104 stuck for 25801s! > [migration/104:175] > [41807.960119] watchdog: BUG: soft lockup - CPU#104 stuck for 25823s! > [migration/104:175] > [41831.959864] watchdog: BUG: soft lockup - CPU#104 stuck for 25846s! > [migration/104:175] > [41833.407851] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [41833.407868] rcu: 136-...0: (135 ticks this GP) > idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2760381 > [41863.959524] watchdog: BUG: soft lockup - CPU#104 stuck for 25875s! > [migration/104:175]
I don't suppose you were able to get any more of the log saved? (The first error messages that happened might be interesting) Thanks, Nick