> 
> On Fri, 14 Dec 2018 at 13:56, James Morse <james.mo...@arm.com> wrote:
> >
> > Hi Dongjiu Geng,
> >
> > On 14/12/2018 10:15, Dongjiu Geng wrote:
> > > When user space do memory recovery, it will check whether KVM and 
> > > guest support the error recovery, only when both of them support, 
> > > user space will do the error recovery. This patch exports this 
> > > capability of KVM to user space.
> >
> > I can understand user-space only wanting to do the work if host and 
> > guest support the feature. But 'error recovery' isn't a KVM feature, 
> > its a Linux kernel feature.
> >
> > KVM will send it's user-space a SIGBUS with MCEERR code whenever its 
> > trying to map a page at stage2 that the kernel-mm code refuses this because 
> > its poisoned.
> > (e.g. check_user_page_hwpoison(), get_user_pages() returns 
> > -EHWPOISON)
> >
> > This is exactly the same as happens to a normal user-space process.
> >
> > I think you really want to know if the host kernel was built with 
> > CONFIG_MEMORY_FAILURE.
> 
> Does userspace need to care about that? Presumably if the host kernel 
> wasn't built with that support then it will simply never deliver any memory 
> failure events to QEMU, which is fine.
> 
> The point I was trying to make in the email Dongjiu references
> (https://patchwork.codeaurora.org/patch/652261/) is simply that "QEMU gets 
> memory-failure notifications from the host kernel"
> does not imply "the guest is prepared to receive memory failure 
> notifications", and so the code path which handles the SIGBUS must do 
> some kind of check for whether the guest CPU is a type which expects them and 
> that the board code set up the ACPI tables that it wants to fill in.

Thanks Peter's explanation. Frankly speaking, I agree Peter's suggestion.

To James, I explain more to you, as peter said QEMU needs to check whether the 
guest CPU is a type which can handle the error though guest ACPI table. Let us 
see the X86's QEMU logic:
1. Before the vCPU created, it will set a default env->mcg_cap value with 
MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports RAS 
error recovery.[1] 2. when the vCPU initialize, it will check whether host 
kernel support this feature[2]. Only when host kernel and default env->mcg_cap 
value all expected this feature, then it will setup vCPU support RAS error 
recovery[3].
So I add this IOCTL "KVM_CAP_ARM_MEMORY_ERROR_RECOVERY" to Let QEMU check 
whether host/KVM support RAS error detection and recovery, only when this 
supports, QEMU will do the error recovery for the guest memory. 

[1]
#define MCE_CAP_DEF     (MCG_CTL_P|MCG_SER_P)
cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF |
                        (cpu->enable_lmce ? MCG_LMCE_P : 0);

[2] ret = kvm_get_mce_cap_supported(cs->kvm_state, &mcg_cap, &banks);

[3]
env->mcg_cap &= mcg_cap | MCG_CAP_BANKS_MASK;
ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, &env->mcg_cap);

-------------------------------------For James's 
comments---------------------------------------------------------------------
> KVM doesn't detect these errors.
> The hardware detects them and notifies the OS via one of a number of 
> mechanisms.
> This gets plumbed into memory_failure(), which sets a flag that the mm 
> code uses to prevent the page being used again.

> KVM is only involved when it tries to map a page at stage2 and the mm 
> code rejects it with -EHWPOISON. This is the same as the architectures
> do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of 
> handle_mm_fault(). We don't have a KVM cap for this, nor do we need one.
------------------------------------------------------------------------------------------------------------------------------
James, for your above comments, I completed understand, but KVM also delivered 
the SIGBUS, which means KVM supports guest memory RAS error recovery, so maybe 
we need to tell user space this capability.

---------------------------------------------- For James's comments 
---------------------------------------------------
> The CPU RAS Extensions are not at all relevant here. It is perfectly 
> possible to support memory-failure without them, AMD-Seattle and 
> APM-X-Gene do this. These systems would report not-supported here, but the 
> kernel does support this stuff.
> Just because the CPU supports this, doesn't mean the kernel was built 
> with CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to 
> SIGKILL.
--------------------------------------------------------------------------------------------------------------------------------------
James, for your above comments, if you think we should not check the 
"cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", which do you prefer we should check?
In the X86 KVM code, it uses hardcode to tell use space the host/KVM support 
RAS error software recovery[4]. If KVM does not check the " 
cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", we have to check the hardcode as 
X86's method.

[4]:
u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;

long kvm_arch_dev_ioctl(struct file *filp,
                        unsigned int ioctl, unsigned long arg)
{
    ............................................................................
        case KVM_X86_GET_MCE_CAP_SUPPORTED: {
                r = -EFAULT;
                if (copy_to_user(argp, &kvm_mce_cap_supported,
                                 sizeof(kvm_mce_cap_supported)))
                        goto out;
                r = 0;
                break;
        }
    .......................................................................
}

> 
> thanks
> -- PMM

Reply via email to