Arthur Chunqi Li wrote on 2013-09-05:
> On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z <[email protected]>
> wrote:
> > Arthur Chunqi Li wrote on 2013-09-04:
> >> This patch contains the following two changes:
> >> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> >> with some reasons not emulated by L1, preemption timer value should
> >> be save in such exits.
> >> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
> >> to nVMX.
> >>
> >> With this patch, nested VMX preemption timer features are fully supported.
> >>
> >> Signed-off-by: Arthur Chunqi Li <[email protected]>
> >> ---
> >> This series depends on queue.
> >>
> >> arch/x86/include/uapi/asm/msr-index.h | 1 +
> >> arch/x86/kvm/vmx.c | 51
> >> ++++++++++++++++++++++++++++++---
> >> 2 files changed, 48 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/x86/include/uapi/asm/msr-index.h
> >> b/arch/x86/include/uapi/asm/msr-index.h
> >> index bb04650..b93e09a 100644
> >> --- a/arch/x86/include/uapi/asm/msr-index.h
> >> +++ b/arch/x86/include/uapi/asm/msr-index.h
> >> @@ -536,6 +536,7 @@
> >>
> >> /* MSR_IA32_VMX_MISC bits */
> >> #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL <<
> 29)
> >> +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F
> >> /* AMD-V MSRs */
> >>
> >> #define MSR_VM_CR 0xc0010114
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> >> 1f1da43..870caa8
> >> 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -2204,7 +2204,14 @@ static __init void
> >> nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64
> >> VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif
> >> - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
> >> + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
> >> + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> >> + if (!(nested_vmx_pinbased_ctls_high &
> >> PIN_BASED_VMX_PREEMPTION_TIMER))
> >> + nested_vmx_exit_ctls_high &=
> >> + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
> >> + if (!(nested_vmx_exit_ctls_high &
> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
> >> + nested_vmx_pinbased_ctls_high &=
> >> + (~PIN_BASED_VMX_PREEMPTION_TIMER);
> > The following logic is more clearly:
> > if(nested_vmx_pinbased_ctls_high &
> PIN_BASED_VMX_PREEMPTION_TIMER)
> > nested_vmx_exit_ctls_high |=
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
> Here I have such consideration: this logic is wrong if CPU support
> PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this does
> occurs. So the codes above reads the MSR and reserves the features it
> supports, and here I just check if these two features are supported
> simultaneously.
>
No. Only VM_EXIT_SAVE_VMX_PREEMPTION_TIMER depends on
PIN_BASED_VMX_PREEMPTION_TIMER. PIN_BASED_VMX_PREEMPTION_TIMER is an
independent feature
> You remind that this piece of codes can write like this:
> if (!(nested_vmx_pin_based_ctls_high &
> PIN_BASED_VMX_PREEMPTION_TIMER) ||
> !(nested_vmx_exit_ctls_high &
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
> nested_vmx_exit_ctls_high
> &=(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
> nested_vmx_pinbased_ctls_high &=
> (~PIN_BASED_VMX_PREEMPTION_TIMER);
> }
>
> This may reflect the logic I describe above that these two flags should
> support
> simultaneously, and brings less confusion.
> >
> > BTW: I don't see nested_vmx_setup_ctls_msrs() considers the hardware's
> capability when expose those vmx features(not just preemption timer) to L1.
> The codes just above here, when setting pinbased control for nested vmx, it
> firstly "rdmsr MSR_IA32_VMX_PINBASED_CTLS", then use this to mask the
> features hardware not support. So does other control fields.
> >
Yes, I saw it.
> >> nested_vmx_exit_ctls_high |=
> >> (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> >> VM_EXIT_LOAD_IA32_EFER);
> >>
> >> @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu
> >> *vcpu, u64 *info1, u64 *info2)
> >> *info2 = vmcs_read32(VM_EXIT_INTR_INFO); }
> >>
> >> +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) {
> >> + u64 delta_tsc_l1;
> >> + u32 preempt_val_l1, preempt_val_l2, preempt_scale;
> >> +
> >> + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
> >> +
> MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
> >> + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> >> + delta_tsc_l1 = kvm_x86_ops->read_l1_tsc(vcpu,
> >> + native_read_tsc()) - vcpu->arch.last_guest_tsc;
> >> + preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
> >> + if (preempt_val_l2 - preempt_val_l1 < 0)
> >> + preempt_val_l2 = 0;
> >> + else
> >> + preempt_val_l2 -= preempt_val_l1;
> >> + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> preempt_val_l2); }
> >> /*
> >> * The guest has exited. See if we can fix it or if we need userspace
> >> * assistance.
> >> @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu
> *vcpu)
> >> struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> u32 exit_reason = vmx->exit_reason;
> >> u32 vectoring_info = vmx->idt_vectoring_info;
> >> + int ret;
> >>
> >> /* If guest state is invalid, start emulating */
> >> if (vmx->emulation_required)
> >> @@ -6795,12 +6820,15 @@ static int vmx_handle_exit(struct kvm_vcpu
> >> *vcpu)
> >>
> >> if (exit_reason < kvm_vmx_max_exit_handlers
> >> && kvm_vmx_exit_handlers[exit_reason])
> >> - return kvm_vmx_exit_handlers[exit_reason](vcpu);
> >> + ret = kvm_vmx_exit_handlers[exit_reason](vcpu);
> >> else {
> >> vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
> >> vcpu->run->hw.hardware_exit_reason = exit_reason;
> >> + ret = 0;
> >> }
> >> - return 0;
> >> + if (is_guest_mode(vcpu))
> >> + nested_adjust_preemption_timer(vcpu);
> > Move this forward to avoid the changes for ret.
> The previous codes simply "return
> kvm_vmx_exit_handlers[exit_reason](vcpu);", which may also consumes CPU
> times. So put "nested_adjust_preemption_timer" ahead may cause the
> statistics inaccuracy.
Then you should put it before vmentry. Here still far from the point of vmentry.
> >> + return ret;
> >> }
> >>
> >> static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int
> >> irr) @@
> >> -7518,6 +7546,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu,
> >> struct vmcs12 *vmcs12) {
> >> struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> u32 exec_control;
> >> + u32 exit_control;
> >>
> >> vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
> >> vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> @@
> >> -7691,7 +7720,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu,
> >> struct vmcs12 *vmcs12)
> >> * we should use its exit controls. Note that
> VM_EXIT_LOAD_IA32_EFER
> >> * bits are further modified by vmx_set_efer() below.
> >> */
> >> - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
> >> + exit_control = vmcs_config.vmexit_ctrl;
> >> + if (vmcs12->pin_based_vm_exec_control &
> >> PIN_BASED_VMX_PREEMPTION_TIMER)
> >> + exit_control |=
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> >> + vmcs_write32(VM_EXIT_CONTROLS, exit_control);
> > And here should be problem if host doesn't support
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
> Nested vmx does check the hardware support of these features in
> "nested_vmx_setup_ctls_msrs", see my comments above.
But here you don't check it. You just set it unconditionally.
> >
> >>
> >> /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and
> VM_ENTRY_IA32E_MODE are
> >> * emulated by vmx_set_efer(), below.
> >> @@ -8090,6 +8122,17 @@ static void prepare_vmcs12(struct kvm_vcpu
> >> *vcpu, struct vmcs12 *vmcs12)
> >> vmcs12->guest_pending_dbg_exceptions =
> >> vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
> >>
> >> + if (vmcs12->pin_based_vm_exec_control &
> >> + PIN_BASED_VMX_PREEMPTION_TIMER) {
> >> + if (vmcs12->vm_exit_controls &
> >> +
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
> >> + vmcs12->vmx_preemption_timer_value =
> >> +
> vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> >> + else
> >> +
> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> >> +
> >> + vmcs12->vmx_preemption_timer_value);
> > Why write it to vmcs directly if VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
> not set?
> Yes, writing is needless here since vmcs02 will be re-prepared via
> prepare_vmcs02() when L1->L2. This function just save information needed,
> vmcs_write is useless.
>
I don't sure I am seeing your point. Per my point, even the
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is not setting, you still need to save the
value to vmcs12->vmx_preemption_timer_value. Or else, prepare_vmcs02 cannot get
the right value.
> Arthur
> >
> >> + }
> >> +
> >> /*
> >> * In some cases (usually, nested EPT), L2 is allowed to change its
> >> * own CR3 without exiting. If it has changed it, we must keep it.
> >> --
> >> 1.7.9.5
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in the
> >> body of a message to [email protected] More majordomo info at
> >> http://vger.kernel.org/majordomo-info.html
> >
> > Best regards,
> > Yang
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body of
> a
> message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html
Best regards,
Yang