On Mon, Oct 13, 2025 at 06:59:00PM +0000, Jiaqi Yan wrote: > Problem > ======= > > When host APEI is unable to claim a synchronous external abort (SEA) > during guest abort, today KVM directly injects an asynchronous SError > into the VCPU then resumes it. The injected SError usually results in > unpleasant guest kernel panic. > > One of the major situation of guest SEA is when VCPU consumes recoverable > uncorrected memory error (UER), which is not uncommon at all in modern > datacenter servers with large amounts of physical memory. Although SError > and guest panic is sufficient to stop the propagation of corrupted memory, > there is room to recover from an UER in a more graceful manner. > > Proposed Solution > ================= > > The idea is, we can replay the SEA to the faulting VCPU. If the memory > error consumption or the fault that cause SEA is not from guest kernel, > the blast radius can be limited to the poison-consuming guest process, > while the VM can keep running. > > In addition, instead of doing under the hood without involving userspace, > there are benefits to redirect the SEA to VMM: > > - VM customers care about the disruptions caused by memory errors, and > VMM usually has the responsibility to start the process of notifying > the customers of memory error events in their VMs. For example some > cloud provider emits a critical log in their observability UI [1], and > provides a playbook for customers on how to mitigate disruptions to > their workloads. > > - VMM can protect future memory error consumption by unmapping the poisoned > pages from stage-2 page table with KVM userfault [2], or by splitting the > memslot that contains the poisoned pages. > > - VMM can keep track of SEA events in the VM. When VMM thinks the status > on the host or the VM is bad enough, e.g. number of distinct SEAs > exceeds a threshold, it can restart the VM on another healthy host. > > - Behavior parity with x86 architecture. When machine check exception > (MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to > let VMM either recover from the MCE, or terminate itself with VM. > The prior RFC proposes to implement SIGBUS on arm64 as well, but > Marc preferred KVM exit over signal [3]. However, implementation > aside, returning SEA to VMM is on par with returning MCE to VMM. > > Once SEA is redirected to VMM, among other actions, VMM is encouraged > to inject external aborts into the faulting VCPU.
I don't know much about the KVM details but this explanation makes sense to me and we also have use cases for all of what is written here. Thanks, Jason
