On Thu, May 17, 2012 at 01:22:05AM +0000, Mao, Junjie wrote:
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:[email protected]]
> > Sent: Tuesday, May 15, 2012 10:42 AM
> > To: Mao, Junjie
> > Cc: '[email protected]'
> > Subject: Re: [PATCH v2] KVM: x86: Implement PCID/INVPCID for guests with
> > EPT
> >
> > On Mon, May 14, 2012 at 06:25:40AM +0000, Mao, Junjie wrote:
> > > This patch handles PCID/INVPCID for guests.
> > >
> > > Process-context identifiers (PCIDs) are a facility by which a logical
> > > processor may cache information for multiple linear-address spaces so
> > > that the processor may retain cached information when software
> > > switches to a different linear address space. Refer to section 4.10.1
> > > in IA32 Intel Software Developer's Manual Volume 3A for details.
> > >
> > > For guests with EPT, the PCID feature is enabled and INVPCID behaves
> > > as running natively.
> > > For guests without EPT, the PCID feature is disabled and INVPCID triggers
> > #UD.
> > >
> > > Changes from v1:
> > > Move cr0/cr4 writing checks to x86.c
> > > Update comments for the reason why PCID is disabled for non-EPT guests
> > > Do not support PCID/INVPCID for nested guests at present
> > > Clean up useless symbols
> > >
> > > Signed-off-by: Junjie Mao <[email protected]>
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -1711,6 +1711,18 @@ static bool vmx_rdtscp_supported(void)
> > > return cpu_has_vmx_rdtscp();
> > > }
> > >
> > > +static bool vmx_pcid_supported(void)
> > > +{
> > > + /*
> > > + * Enable INVPCID for non-ept guests may cause performance regression,
> > > + * and without INVPCID, PCID has little benefits. So disable them all
> > > + * for non-ept guests.
> > > + *
> > > + * PCID is not supported for nested guests yet.
> > > + */
> > > + return enable_ept && (boot_cpu_data.x86_capability[4] &
> > > +bit(X86_FEATURE_PCID)) && !cpu_has_hypervisor; }
> >
> > The comment Avi made was regarding running a nested guest, not running
> > _as_ a nested guest (which is what cpu_has_hypervisor is about).
> >
> > You can disable INVPCID exec control (which #UDs), if its in Level-2 guest
> > mode
> > (see if_guest_mode()), and restore the Level-1 value when leaving nested
> > mode.
>
> This "!cpu_has_hypervisor " is brought by my ignorance on nested vmx. Sorry
> for that.
That makes two of us. No problem.
> > > +
> > > /*
> > > * Swap MSR entry in host/guest MSR entry array.
> > > */
> > > @@ -2425,6 +2437,8 @@ static __init int setup_vmcs_config(struct
> > vmcs_config *vmcs_conf)
> > > SECONDARY_EXEC_UNRESTRICTED_GUEST |
> > > SECONDARY_EXEC_PAUSE_LOOP_EXITING |
> > > SECONDARY_EXEC_RDTSCP;
> > > + if (vmx_pcid_supported())
> > > + opt2 |= SECONDARY_EXEC_ENABLE_INVPCID;
> >
> >
> > You should allow ENABLE_INVPCID control to be set unconditionally here, and
> > adjusted in vmx_secondary_exec_control().
> >
> > (note that "enable_ept" might be cleared after setup_vmcs_config).
> >
> > > if (adjust_vmx_controls(min2, opt2,
> > > MSR_IA32_VMX_PROCBASED_CTLS2,
> > > &_cpu_based_2nd_exec_control) < 0) @@
> > > -6420,6
> > +6434,20 @@ static
> > > void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > }
> > > }
> > > }
> > > +
> > > + if (vmx_pcid_supported()) {
> > > + best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
> > > + if (!best || !(best->ecx & bit(X86_FEATURE_PCID))) {
> > > + /* Hiding INVPCID when PCID is not exposed. */
> > > + exec_control =
> > vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> > > + exec_control &= ~SECONDARY_EXEC_ENABLE_INVPCID;
> > > + vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
> > > + exec_control);
> > > + best = kvm_find_cpuid_entry(vcpu, 0x7, 0);
> > > + if (best)
> > > + best->ecx &= ~bit(X86_FEATURE_INVPCID);
> > > + }
> > > + }
> >
> > - If X86_FEATURE_CPUID bit is set by guest, but X86_FEATURE_INVPCID is
> > cleared, this allows invpcid to execute (which is wrong, it should
> > #UD).
>
> Qemu doesn't support adjusting cpuid leaf 7 at present. The PCID/INVPCID
> features are both controlled by the 'pcid' bit in cpuid leaf 1, i.e.
> - On platforms supporting both the features, 'cpu xxx,+pcid' exposing
> both PCID and INVPCID and 'cpu xxx,-pcid' hiding both.
> - On platforms supporting PCID only, INVPCID is always not available to
> guests and PCID is exposed or hidden according to the command line.
> - On platforms supporting neither, both of them are always hidden.
> This is done because PCID brings little (if any) benefits without the INVPCID
> instruction.
> Thus if X86_FEATURE_INVPCID is cleared at this point, this means the host
> doesn't support it and setup_vmcs_config has already masked
> SECONDARY_VM_EXEC_CONTROL according to IA32_VMX_PROCBASED_CTLS2.
QEMU is not the only user of KVM. Just
if (X86_FEATURE_INVPCID OR X86_FEATURE_INVPCID) are unset
clear SECONDARY_VM_EXEC_CONTROL
Independently of the facts above, this makes the code more robust.
> > - Must enable vm_exec control bit if cpuid reports the feature enabled,
> > not only disable it.
>
> The bit is default to be set if the host supports it. The code here is only
> used when qemu explicitly disables PCID on platforms supporting the feature.
Userspace is free to call ioctl(SET_CPUID) in any order and number of
times it pleases. Please support both enable and disable (it is easy,
see RDTSCP code in the function).
> > > }
> > >
> > > static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2
> > > *entry) @@ -7154,6 +7182,7 @@ static struct kvm_x86_ops vmx_x86_ops =
> > {
> > > .cpuid_update = vmx_cpuid_update,
> > >
> > > .rdtscp_supported = vmx_rdtscp_supported,
> > > + .pcid_supported = vmx_pcid_supported,
> > >
> > > .set_supported_cpuid = vmx_set_supported_cpuid,
> > >
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> > > c9d99e5..f930597 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -527,6 +527,10 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned
> > long cr0)
> > > return 1;
> > > }
> > >
> > > + if ((old_cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PG) &&
> > > + kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))
> > > + return 1;
> > > +
> > > kvm_x86_ops->set_cr0(vcpu, cr0);
> >
> > For completeness, kvm_set_cr3 should deal with the CR3 bits. It is used by
> > nested VMX for example.
>
> I think you're talking about the reserved bit checks. I'll update it in the
> next version.
Yes, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html