On 20/04/21 19:31, Sean Christopherson wrote:
+       case KVM_HC_PAGE_ENC_STATUS: {
+               u64 gpa = a0, npages = a1, enc = a2;
+
+               ret = -KVM_ENOSYS;
+               if (!vcpu->kvm->arch.hypercall_exit_enabled)

I don't follow, why does the hypercall need to be gated by a capability?  What
would break if this were changed to?

                if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))

The problem is that it's valid to take KVM_GET_SUPPORTED_CPUID and send it unmodified to KVM_SET_CPUID2. For this reason, features that are conditional on other ioctls, or that require some kind of userspace support, must not be in KVM_GET_SUPPORTED_CPUID. For example:

- TSC_DEADLINE because it is only implemented after KVM_CREATE_IRQCHIP (or after KVM_ENABLE_CAP of KVM_CAP_IRQCHIP_SPLIT)

- MONITOR only makes sense if userspace enables KVM_CAP_X86_DISABLE_EXITS

X2APIC is reported even though it shouldn't be. Too late to fix that, I think.

In this particular case, if userspace sets the bit in CPUID2 but doesn't handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall.

(I should document that, Jim asked for documentation around KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST many times).

Paolo

+                       break;
+
+               if (!PAGE_ALIGNED(gpa) || !npages ||
+                   gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
+                       ret = -EINVAL;
+                       break;
+               }
+
+               vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+               vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
+               vcpu->run->hypercall.args[0]  = gpa;
+               vcpu->run->hypercall.args[1]  = npages;
+               vcpu->run->hypercall.args[2]  = enc;
+               vcpu->run->hypercall.longmode = op_64_bit;
+               vcpu->arch.complete_userspace_io = complete_hypercall_exit;
+               return 0;
+       }
        default:
                ret = -KVM_ENOSYS;
                break;

...

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 590cc811c99a..d696a9f13e33 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
                vcpu->arch.msr_kvm_poll_control = data;
                break;
+ case MSR_KVM_MIGRATION_CONTROL:
+               if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE)
+                       return 1;
+
+               if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))

Why let the guest write '0'?  Letting the guest do WRMSR but not RDMSR is
bizarre.

Because it was the simplest way to write the code, but returning 0 unconditionally from RDMSR is actually simpler.

Paolo

+                       return 1;
+               break;
+
        case MSR_IA32_MCG_CTL:
        case MSR_IA32_MCG_STATUS:
        case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
@@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
                if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
                        return 1;
+ msr_info->data = 0;
+               break;
+       case MSR_KVM_MIGRATION_CONTROL:
+               if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
+                       return 1;
+
                msr_info->data = 0;
                break;
        case MSR_KVM_STEAL_TIME:
--
2.26.2



Reply via email to