On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote:
> On 2013-07-02 17:15, Gleb Natapov wrote:
> > On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote:
> >> On 2013-07-02 15:59, Gleb Natapov wrote:
> >>> On Tue, Jul 02, 2013 at 03:01:24AM +0000, Zhang, Yang Z wrote:
> >>>> Since this series is pending in mail list for long time. And it's really
> >>>> a big feature for Nested. Also, I doubt the original authors(Jun and
> >>>> Nahav)should not have enough time to continue it. So I will pick it up.
> >>>> :)
> >>>>
> >>>> See comments below:
> >>>>
> >>>> Paolo Bonzini wrote on 2013-05-20:
> >>>>> Il 19/05/2013 06:52, Jun Nakajima ha scritto:
> >>>>>> From: Nadav Har'El <[email protected]>
> >>>>>>
> >>>>>> Recent KVM, since
> >>>>> http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
> >>>>>> switch the EFER MSR when EPT is used and the host and guest have
> >>>>>> different
> >>>>>> NX bits. So if we add support for nested EPT (L1 guest using EPT to
> >>>>>> run L2)
> >>>>>> and want to be able to run recent KVM as L1, we need to allow L1 to
> >>>>>> use this
> >>>>>> EFER switching feature.
> >>>>>>
> >>>>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if
> >>>>> available,
> >>>>>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This
> >>>>>> patch adds
> >>>>>> support for the former (the latter is still unsupported).
> >>>>>>
> >>>>>> Nested entry and exit emulation (prepare_vmcs_02 and
> >>>>> load_vmcs12_host_state,
> >>>>>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly.
> >>>>>> So
> >>>>> all
> >>>>>> that's left to do in this patch is to properly advertise this feature
> >>>>>> to L1.
> >>>>>>
> >>>>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by
> >>>>> using
> >>>>>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we
> >>>>>> always
> >>>>>> support this feature, regardless of whether the host supports it.
> >>>>>>
> >>>>>> Signed-off-by: Nadav Har'El <[email protected]>
> >>>>>> Signed-off-by: Jun Nakajima <[email protected]>
> >>>>>> Signed-off-by: Xinhao Xu <[email protected]>
> >>>>>> ---
> >>>>>> arch/x86/kvm/vmx.c | 23 ++++++++++++++++-------
> >>>>>> 1 file changed, 16 insertions(+), 7 deletions(-)
> >>>>>>
> >>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>>> index 260a919..fb9cae5 100644
> >>>>>> --- a/arch/x86/kvm/vmx.c
> >>>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>>> @@ -2192,7 +2192,8 @@ static __init void
> >>>>>> nested_vmx_setup_ctls_msrs(void)
> >>>>>> #else
> >>>>>> nested_vmx_exit_ctls_high = 0;
> >>>>>> #endif
> >>>>>> - nested_vmx_exit_ctls_high |=
> >>>>> VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> >>>>>> + nested_vmx_exit_ctls_high |=
> >>>>> (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> >>>>>> + VM_EXIT_LOAD_IA32_EFER);
> >>>>>>
> >>>>>> /* entry controls */
> >>>>>> rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
> >>>>>> @@ -2201,8 +2202,8 @@ static __init void
> >>>>>> nested_vmx_setup_ctls_msrs(void)
> >>>>>> nested_vmx_entry_ctls_low =
> >>>>> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
> >>>>>> nested_vmx_entry_ctls_high &=
> >>>>>> VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE;
> >>>>>> - nested_vmx_entry_ctls_high |=
> >>>>> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
> >>>>>> -
> >>>>>> + nested_vmx_entry_ctls_high |=
> >>>>> (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR |
> >>>>>> + VM_ENTRY_LOAD_IA32_EFER);
> >>>>>> /* cpu-based controls */
> >>>>>> rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
> >>>>>> nested_vmx_procbased_ctls_low,
> >>>>>> nested_vmx_procbased_ctls_high);
> >>>>>> @@ -7492,10 +7493,18 @@ static void prepare_vmcs02(struct kvm_vcpu
> >>>>>> *vcpu,
> >>>>> struct vmcs12 *vmcs12)
> >>>>>> vcpu->arch.cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask;
> >>>>>> vmcs_writel(CR0_GUEST_HOST_MASK,
> >>>>> ~vcpu->arch.cr0_guest_owned_bits);
> >>>>>>
> >>>>>> - /* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer
> >>>>> below */
> >>>>>> - vmcs_write32(VM_EXIT_CONTROLS,
> >>>>>> - vmcs12->vm_exit_controls | vmcs_config.vmexit_ctrl);
> >>>>>> - vmcs_write32(VM_ENTRY_CONTROLS, vmcs12->vm_entry_controls |
> >>>>>> + /* L2->L1 exit controls are emulated - the hardware exit is to
> >>>>>> L0 so
> >>>>>> + * we should use its exit controls. Note that IA32_MODE,
> >>>>>> LOAD_IA32_EFER
> >>>>>> + * bits are further modified by vmx_set_efer() below.
> >>>>>> + */
> >>>>>> + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
> >>>> This is wrong. We cannot use L0 exit control directly.
> >>>> LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT, ACK_INTR_ON_EXIT
> >>>> should use host's exit control. But others, still need use (vmcs12|host).
> >>>>
> >>> I do not see why. We always intercept DR7/PAT/EFER, so save is emulated
> >>> too. Host address space size always come from L0 and preemption timer is
> >>> not supported for nested IIRC and when it will be host will have to save
> >>> it on exit anyway for correct emulation.
> >>
> >> Preemption timer is already supported and works fine as far as I tested.
> >> KVM doesn't use it for L1, so we do not need to save/restore it - IIRC.
> >>
> > So what happens if L1 configures it to value X after X/2 ticks L0 exit
> > happen and L0 gets back to L2 directly. The counter will be X again
> > instead of X/2.
>
> Likely. Yes, we need to improve our emulation by setting "Save
> VMX-preemption timer value" or emulate this in software if the hardware
> lacks support for it (was this flag introduced after the preemption
> timer itself?).
>
Not sure, but my point was that for correct emulation host needs to set
"save preempt timer on vmexit" anyway so all VM_EXIT_CONTROLS are
indeed emulated as far as I see.
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html