On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-04:
> > On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-12-03:
> >>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>> manually, which is fully taken care of by the hardware. This needs
> >>>> some special awareness into existing interrupr injection path:
> >>>> 
> >>>> - for pending interrupt, instead of direct injection, we may need
> >>>>   update architecture specific indicators before resuming to guest. -
> >>>>   A pending interrupt, which is masked by ISR, should be also
> >>>>   considered in above update action, since hardware will decide when
> >>>>   to inject it at right time. Current has_interrupt and get_interrupt
> >>>>   only returns a valid vector from injection p.o.v.
> >>> Most of my previous comments still apply.
> >>> 
> >>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>> +                int trig_mode, int always_set)
> >>>> +{
> >>>> +        if (kvm_x86_ops->set_eoi_exitmap)
> >>>> +                kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >>>> +                                        trig_mode, always_set);
> >>>> +}
> >>>> +
> >>>>  /*
> >>>>   * Add a pending IRQ into lapic.
> >>>>   * Return 1 if successfully added and 0 if discarded.
> >>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic,
> > int
> >>> delivery_mode,
> >>>>                  if (unlikely(!apic_enabled(apic)))
> >>>>                          break;
> >>>> +                kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> >>> As I said in the last review rebuild the bitmap when ioapic or irq
> >>> notifier configuration changes, user request bit to notify vcpus to
> >>> reload the bitmap.
> >> It is too complicated. When program ioapic entry, we cannot get the target 
> >> vcpu
> > easily. We need to read destination format register and logical destination
> > register to find out target vcpu if using logical mode. Also, we must trap 
> > every
> > modification to the two registers to update eoi bitmap.
> > No need to check target vcpu. Enable exit on all vcpus for the vector
> This is wrong. As we known, modern OS uses per VCPU vector. We cannot ensure 
> all vectors have same trigger mode. And what's worse, the vector in another 
> vcpu is used to handle high frequency interrupts(like 10G NIC), then it will 
> hurt performance.
> 
I never saw OSes reuse vector used by ioapic, as far as I see this
is not how Linux code works. Furthermore it will not work with KVM
currently since apic eoi redirected to ioapic based on vector alone,
not vector/vcpu pair and as far as I am aware this is how real HW works.

> > programmed into ioapic. Which two registers? All accesses to ioapic are
> > trapped and reconfiguration is rare.
> In logical mode, the destination VCPU is depend on each CPU's destination 
> format register and logical destination register. So we must also trap the 
> two registers.
> And if it uses lowest priority delivery mode, the PPR need to be trapped too. 
> Since PPR will change on each interrupt injection, the cost should be higher 
> than current approach.
No need for all of that if bitmask it global.

> 
> >> For irq notifier, only PIT is special which is edge trigger but need an
> >> EOI notifier. So, just treat it specially. And TMR can cover others.
> >> 
> > We shouldn't assume that. If another notifier will be added it will be
> > easy to forget to update apicv code to exclude another vector too.
> At this point, guest is not running(in device initializing), we cannot not 
> know the vector. As you mentioned, the best point is when guest program 
> ioapic entry. But it also is impossible to get the vector(see above).
> I can give some comments on the function to remind the caller to update eoi 
> bitmap when the interrupt is edge and they still want to get EOI vmexit.
> 
> >>> 
> >>>>                  if (trig_mode) {
> >>>>                          apic_debug("level trig mode for vector %d", 
> >>>> vector);
> >>>>                          apic_set_vector(vector, apic->regs + APIC_TMR);
> >>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu
> > *vcpu1,
> >>> struct kvm_vcpu *vcpu2)
> >>>>          return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> >>>>  }
> >>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >>>> +{ +     if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
> >>>> APIC_SPIV_DIRECTED_EOI) && +        
> >>>> kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { +          int
> >>>> trigger_mode; +          if (apic_test_vector(vector, apic->regs + 
> >>>> APIC_TMR))
> >>>> +                        trigger_mode = IOAPIC_LEVEL_TRIG; +             
> >>>> else +                  trigger_mode =
> >>>> IOAPIC_EDGE_TRIG; +              kvm_ioapic_update_eoi(apic->vcpu->kvm, 
> >>>> vector,
> >>>> trigger_mode); + } +} +
> >>>>  static int apic_set_eoi(struct kvm_lapic *apic) {       int vector =
> >>>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
> >>>>  apic_set_eoi(struct kvm_lapic *apic)    apic_clear_isr(vector, apic);
> >>>>          apic_update_ppr(apic);
> >>>> -        if (!(kvm_apic_get_reg(apic, APIC_SPIV) & 
> >>>> APIC_SPIV_DIRECTED_EOI)
> >>>> && -         kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { -     
> >>>>         int
> >>>> trigger_mode; -          if (apic_test_vector(vector, apic->regs + 
> >>>> APIC_TMR))
> >>>> -                        trigger_mode = IOAPIC_LEVEL_TRIG; -             
> >>>> else -                  trigger_mode =
> >>>> IOAPIC_EDGE_TRIG; -              kvm_ioapic_update_eoi(apic->vcpu->kvm, 
> >>>> vector,
> >>>> trigger_mode); - } +     kvm_ioapic_send_eoi(apic, vector);
> >>>>          kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >>>>          return vector;
> >>>>  }
> >>>> +/*
> >>>> + * this interface assumes a trap-like exit, which has already finished
> >>>> + * desired side effect including vISR and vPPR update.
> >>>> + */
> >>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> >>>> +{
> >>>> +        struct kvm_lapic *apic = vcpu->arch.apic;
> >>>> +
> >>> trace_kvm_eoi()
> >> Ok.
> >> 
> >>>> +        kvm_ioapic_send_eoi(apic, vector);
> >>>> +        kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> >>>> +
> >>>>  static void apic_send_ipi(struct kvm_lapic *apic) {     u32 icr_low =
> >>>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
> >>>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)   return highest_irr; }
> >>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +        struct kvm_lapic *apic = vcpu->arch.apic;
> >>>> +
> >>>> +        if (!apic || !apic_enabled(apic))
> >>> Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.
> >> Ok.
> >> 
> >>> 
> >>>> +                return -1;
> >>>> +
> >>>> +        return apic_find_highest_irr(apic);
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
> >>>> +
> >>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>          u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> >>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> >>>> index c42f111..749661a 100644
> >>>> --- a/arch/x86/kvm/lapic.h
> >>>> +++ b/arch/x86/kvm/lapic.h
> >>>> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
> >>>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
> >>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
> >>>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> >>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
> >>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
> >>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
> >>>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64
> >>>>  kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void
> >>>>  kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@ -50,6
> >>>>  +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); int
> >>>>  kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int
> >>>>  kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int
> >>>>  kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
> >>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>> +                int need_eoi, int global);
> >>>>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
> >>>>  
> >>>>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
> > *src,
> >>>> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu
> >>> *vcpu);
> >>>>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
> >>>>  
> >>>>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> >>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
> >>>> 
> >>>>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t
> >>>>  vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>> index dcb7952..8f0903b 100644
> >>>> --- a/arch/x86/kvm/svm.c
> >>>> +++ b/arch/x86/kvm/svm.c
> >>>> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct
> > kvm_vcpu
> >>> *vcpu, int tpr, int irr)
> >>>>                  set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> >>>>  }
> >>>> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +        return 0;
> >>>> +}
> >>>> +
> >>>> +static void svm_update_irq(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +        return ;
> >>>> +}
> >>>> +
> >>>> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>> +                                int trig_mode, int always_set)
> >>>> +{
> >>>> +        return ;
> >>>> +}
> >>>> +
> >>>>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) {     struct vcpu_svm
> >>>>  *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct kvm_x86_ops
> >>>>  svm_x86_ops = {         .enable_nmi_window = enable_nmi_window,
> >>>>          .enable_irq_window = enable_irq_window,         
> >>>> .update_cr8_intercept =
> >>>>  update_cr8_intercept,
> >>>> +        .has_virtual_interrupt_delivery = 
> >>>> svm_has_virtual_interrupt_delivery,
> >>>> +        .update_irq = svm_update_irq;
> >>>> +        .set_eoi_exitmap = svm_set_eoi_exitmap;
> >>>> 
> >>>>          .set_tss_addr = svm_set_tss_addr,
> >>>>          .get_tdp_level = get_npt_level,
> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>> index 6a5f651..909ce90 100644
> >>>> --- a/arch/x86/kvm/vmx.c
> >>>> +++ b/arch/x86/kvm/vmx.c
> >>>> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
> >>>>  static bool __read_mostly enable_apicv_reg;
> >>>>  module_param(enable_apicv_reg, bool, S_IRUGO);
> >>>> +static bool __read_mostly enable_apicv_vid;
> >>>> +module_param(enable_apicv_vid, bool, S_IRUGO);
> >>>> +
> >>>>  /*
> >>>>   * If nested=1, nested virtualization is supported, i.e., guests may use
> >>>>   * VMX and be a hypervisor for its own guests. If nested=0, guests may
> > not
> >>>> @@ -432,6 +435,9 @@ struct vcpu_vmx {
> >>>> 
> >>>>          bool rdtscp_enabled;
> >>>> +        u8 eoi_exitmap_changed;
> >>>> +        u32 eoi_exit_bitmap[8];
> >>>> +
> >>>>          /* Support for a guest hypervisor (nested VMX) */
> >>>>          struct nested_vmx nested;
> >>>>  };
> >>>> @@ -770,6 +776,12 @@ static inline bool
> >>> cpu_has_vmx_apic_register_virt(void)
> >>>>                  SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >>>>  }
> >>>> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> >>>> +{
> >>>> +        return vmcs_config.cpu_based_2nd_exec_ctrl &
> >>>> +                SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >>>> +}
> >>>> +
> >>>>  static inline bool cpu_has_vmx_flexpriority(void)
> >>>>  {
> >>>>          return cpu_has_vmx_tpr_shadow() &&
> >>>> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct
> >>> vmcs_config *vmcs_conf)
> >>>>                          SECONDARY_EXEC_PAUSE_LOOP_EXITING |
> >>>>                          SECONDARY_EXEC_RDTSCP |
> >>>>                          SECONDARY_EXEC_ENABLE_INVPCID |
> >>>> -                        SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >>>> +                        SECONDARY_EXEC_APIC_REGISTER_VIRT |
> >>>> +                        SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >>>>                  if (adjust_vmx_controls(min2, opt2,
> >>>>                                          MSR_IA32_VMX_PROCBASED_CTLS2,
> >>>>                                          &_cpu_based_2nd_exec_control) < 
> >>>> 0)
> >>>> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct
> >>>> vmcs_config *vmcs_conf)
> >>>> 
> >>>>          if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
> >>>>                  _cpu_based_2nd_exec_control &= ~(
> >>>> -                                SECONDARY_EXEC_APIC_REGISTER_VIRT);
> >>>> +                                SECONDARY_EXEC_APIC_REGISTER_VIRT |
> >>>> +                                SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> >>>> 
> >>>>          if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {  
> >>>>         /*
> >>>>  CR3 accesses and invlpg don't need to cause VM Exits when EPT @@
> >>>>  -2724,6 +2738,14 @@ static __init int hardware_setup(void)      if
> >>>>  (!cpu_has_vmx_apic_register_virt())             enable_apicv_reg = 0;
> >>>> +        if (!cpu_has_vmx_virtual_intr_delivery())
> >>>> +                enable_apicv_vid = 0;
> >>>> +
> >>>> +        if (!enable_apicv_vid) {
> >>>> +                kvm_x86_ops->update_irq = NULL;
> >>> Why setting it to NULL? Either drop this since vmx_update_irq() checks
> >>> enable_apicv_vid or better set it to function that does nothing and
> >>> drop enable_apicv_vid check in vmx_update_irq(). Since
> >>> kvm_x86_ops->update_irq will never be NULL you can drop the check
> >>> before calling it.
> >> Sure.
> >> 
> >>>> +                kvm_x86_ops->update_cr8_intercept = NULL;
> >>> Why? It should be other way around: if apicv is enabled set
> >>> update_cr8_intercept callback to NULL.
> >> Yes, this is wrong.
> > Please test the patches with vid disabled and Windows guests. This bug
> > should have prevented it from working.
> > 
> >> 
> >>>> +        }
> >>>> +
> >>>>          if (nested)
> >>>>                  nested_vmx_setup_ctls_msrs();
> >>>> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct
> >>> vcpu_vmx *vmx)
> >>>>                  exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
> >>>>          if (!enable_apicv_reg)
> >>>>                  exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >>>> +        if (!enable_apicv_vid)
> >>>> +                exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >>>>          return exec_control;
> >>>>  }
> >>>> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> > *vmx)
> >>>>                                  vmx_secondary_exec_control(vmx));
> >>>>          }
> >>>> +        if (enable_apicv_vid) {
> >>>> +                vmcs_write64(EOI_EXIT_BITMAP0, 0);
> >>>> +                vmcs_write64(EOI_EXIT_BITMAP1, 0);
> >>>> +                vmcs_write64(EOI_EXIT_BITMAP2, 0);
> >>>> +                vmcs_write64(EOI_EXIT_BITMAP3, 0);
> >>>> +
> >>>> +                vmcs_write16(GUEST_INTR_STATUS, 0);
> >>>> +        }
> >>>> +
> >>>>          if (ple_gap) {
> >>>>                  vmcs_write32(PLE_GAP, ple_gap);
> >>>>                  vmcs_write32(PLE_WINDOW, ple_window);
> >>>> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu
> >>> *vcpu)
> >>>>          return emulate_instruction(vcpu, 0) == EMULATE_DONE;
> >>>>  }
> >>>> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +        unsigned long exit_qualification = 
> >>>> vmcs_readl(EXIT_QUALIFICATION);
> >>>> +        int vector = exit_qualification & 0xff;
> >>>> +
> >>>> +        /* EOI-induced VM exit is trap-like and thus no need to adjust 
> >>>> IP */
> >>>> +        kvm_apic_set_eoi_accelerated(vcpu, vector);
> >>>> +        return 1;
> >>>> +}
> >>>> +
> >>>>  static int handle_apic_write(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>          unsigned long exit_qualification = 
> >>>> vmcs_readl(EXIT_QUALIFICATION);
> >>>> @@ -5755,6 +5798,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct
> >>> kvm_vcpu *vcpu) = {
> >>>>          [EXIT_REASON_TPR_BELOW_THRESHOLD]     = 
> >>>> handle_tpr_below_threshold,
> >>>>          [EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
> >>>>          [EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> >>>>  +       [EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
> >>>>          [EXIT_REASON_WBINVD]                  = handle_wbinvd,
> >>>>          [EXIT_REASON_XSETBV]                  = handle_xsetbv,
> >>>>          [EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> >>>> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu
> >>>> *vcpu)
> >>>> 
> >>>>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int 
> >>>> irr)
> >>>>  {
> >>>> +        /* no need for tpr_threshold update if APIC virtual
> >>>> +         * interrupt delivery is enabled */
> >>>> +        if (!enable_apicv_vid)
> >>>> +                return ;
> >>>> +
> >>> Since you (will) set ->update_cr8_intercept callback to NULL if vid
> >>> is enabled this function will never be called with !enable_apicv_vid,
> >>> so this check can be dropped.
> >> Ok.
> >> 
> >>>>          if (irr == -1 || tpr < irr) {
> >>>>                  vmcs_write32(TPR_THRESHOLD, 0);
> >>>>                  return;
> >>>> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct
> > kvm_vcpu
> >>> *vcpu, int tpr, int irr)
> >>>>          vmcs_write32(TPR_THRESHOLD, irr);
> >>>>  }
> >>>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +        return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
> >>>> +}
> >>>> +
> >>>> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
> >>>> +{
> >>>> +        int tmr;
> >>>> +        tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
> >>>> +                        APIC_TMR + 0x10 * index);
> >>>> +        vmcs_write32(EOI_EXIT_BITMAP0 + index,
> >>>> +                        vmx->eoi_exit_bitmap[index] | tmr);
> >>>> +}
> >>>> +
> >>>> +static void vmx_update_rvi(int vector)
> >>>> +{
> >>>> +        u16 status;
> >>>> +        u8 old;
> >>>> +
> >>>> +        status = vmcs_read16(GUEST_INTR_STATUS);
> >>>> +        old = (u8)status & 0xff;
> >>>> +        if ((u8)vector != old) {
> >>>> +                status &= ~0xff;
> >>>> +                status |= (u8)vector;
> >>>> +                vmcs_write16(GUEST_INTR_STATUS, status);
> >>>> +        }
> >>>> +}
> >>>> +
> >>>> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +        int vector;
> >>>> +        struct vcpu_vmx *vmx = to_vmx(vcpu);
> >>>> +
> >>>> +        if (!enable_apicv_vid)
> >>>> +                return ;
> >>>> +
> >>>> +        vector = kvm_apic_get_highest_irr(vcpu);
> >>>> +        if (vector == -1)
> >>>> +                return;
> >>>> +
> >>>> +        vmx_update_rvi(vector);
> >>>> +
> >>>> +        if (vmx->eoi_exitmap_changed) {
> >>>> +                int index;
> >>>> +                for_each_set_bit(index,
> >>>> +                                (unsigned long 
> >>>> *)(&vmx->eoi_exitmap_changed), 8)
> >>>> +                        vmx_update_eoi_exitmap(vmx, index);
> >>>> +                vmx->eoi_exitmap_changed = 0;
> >>>> +        }
> >>>> +}
> >>>> +
> >>>> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
> >>>> +                                int vector, int trig_mode,
> >>>> +                                int always_set)
> >>>> +{
> >>>> +        struct vcpu_vmx *vmx = to_vmx(vcpu);
> >>>> +        int index, offset, changed;
> >>>> +        struct kvm_lapic *apic;
> >>>> +
> >>>> +        if (!enable_apicv_vid)
> >>>> +                return ;
> >>>> +
> >>>> +        if (WARN_ONCE((vector < 0) || (vector > 255),
> >>>> +                "KVM VMX: vector (%d) out of range\n", vector))
> >>>> +                return;
> >>>> +
> >>>> +        apic = vcpu->arch.apic;
> >>>> +        index = vector >> 5;
> >>>> +        offset = vector & 31;
> >>>> +
> >>>> +        if (always_set)
> >>>> +                changed = !test_and_set_bit(offset,
> >>>> +                                (unsigned long *)&vmx->eoi_exit_bitmap);
> >>>> +        else if (trig_mode)
> >>>> +                changed = !test_bit(offset,
> >>>> +                                apic->regs + APIC_TMR + index * 0x10);
> >>>> +        else
> >>>> +                changed = test_bit(offset,
> >>>> +                                apic->regs + APIC_TMR + index * 0x10);
> >>>> +
> >>>> +        if (changed)
> >>>> +                vmx->eoi_exitmap_changed |= 1 << index;
> >>>> +}
> >>>> +
> >>>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) {    u32
> >>>>  exit_intr_info; @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops
> >>>>  vmx_x86_ops = {         .enable_nmi_window = enable_nmi_window,
> >>>>          .enable_irq_window = enable_irq_window,         
> >>>> .update_cr8_intercept =
> >>>>  update_cr8_intercept,
> >>>> +        .has_virtual_interrupt_delivery = 
> >>>> vmx_has_virtual_interrupt_delivery,
> >>>> +        .update_irq = vmx_update_irq,
> >>>> +        .set_eoi_exitmap = vmx_set_eoi_exitmap,
> >>>> 
> >>>>          .set_tss_addr = vmx_set_tss_addr,
> >>>>          .get_tdp_level = get_ept_level,
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> >>>> b0b8abe..02fe194 100644 --- a/arch/x86/kvm/x86.c +++
> >>>> b/arch/x86/kvm/x86.c @@ -164,6 +164,14 @@ static int
> >>>> emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
> >>>> 
> >>>>  static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
> >>>> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +        if (kvm_x86_ops->has_virtual_interrupt_delivery)
> >>> This callback is never NULL.
> >> Ok.
> >> 
> >>>> +                return 
> >>>> kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
> >>>> +
> >>>> +        return 0;
> >>>> +}
> >>>> +
> >>>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>          int i;
> >>>> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct
> > kvm_vcpu
> >>> *vcpu)
> >>>>                          vcpu->arch.nmi_injected = true;
> >>>>                          kvm_x86_ops->set_nmi(vcpu);
> >>>>                  }
> >>>> -        } else if (kvm_cpu_has_interrupt(vcpu)) {
> >>>> -                if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> >>>> -                        kvm_queue_interrupt(vcpu, 
> >>>> kvm_cpu_get_interrupt(vcpu),
> >>>> -                                            false);
> >>>> +        } else if (kvm_cpu_has_interrupt(vcpu) &&
> >>>> +                        kvm_x86_ops->interrupt_allowed(vcpu)) {
> >>>> +                int vector = -1;
> >>>> +
> >>>> +                if (kvm_apic_vid_enabled(vcpu))
> >>>> +                        vector = kvm_cpu_get_extint(vcpu);
> >>>> +                else
> >>>> +                        vector = kvm_cpu_get_interrupt(vcpu);
> >>>> +
> >>>> +                if (vector != -1) {
> >>>> +                        kvm_queue_interrupt(vcpu, vector, false);
> >>>>                          kvm_x86_ops->set_irq(vcpu);
> >>>>                  }
> >>> If vid is enabled kvm_cpu_has_interrupt() should return true only if there
> >>> is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
> >>> extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
> >>> logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
> >>> of changing interrupt injection logic here and in vcpu_enter_guest()
> >>> bellow. We still need kvm_cpu_has_interrupt() variant that always checks
> >>> both extint and apic for use in kvm_arch_vcpu_runnable() though.
> >> As you mentioned, we still need to checks both extint and apic interrupt in
> > some case. So how to do this? Introduce another argument to indicate
> > whether check both? Yes, we need to check both in
> > kvm_arch_vcpu_runnable(). Another argument is good option. We can have
> > two functions: kvm_cpu_has_injectable_interrupt() for use in irq
> > injection path and kvm_cpu_has_interrupt() for use in
> > kvm_arch_vcpu_runnable(). They will call common one with additional
> > argument to avoid code duplication.
> Ok. will follow this way.
> 
> >> 
> >>>> +
> >>>>          }
> >>>>  }
> >>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> >>> *vcpu)
> >>>>          }
> >>>>  
> >>>>          if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >>>> +                /* update archtecture specific hints for APIC
> >>>> +                 * virtual interrupt delivery */
> >>>> +                if (kvm_x86_ops->update_irq)
> >>>> +                        kvm_x86_ops->update_irq(vcpu);
> >>>> +
> >>> 
> >>> I do not see why this have to be here instead of inside if
> >>> (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
> >>> lines bellow. If you move it there you can drop apic enable check in
> >>> kvm_apic_get_highest_irr().
> >> Yes, it seems ok to move it.
> >> 
> >>>>                  inject_pending_event(vcpu);
> >>>>  
> >>>>                  /* enable NMI/IRQ window open exits if needed */
> >>>>                  if (vcpu->arch.nmi_pending)
> >>>>                          kvm_x86_ops->enable_nmi_window(vcpu);
> >>>> -                else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >>>> +                else if (kvm_apic_vid_enabled(vcpu)) {
> >>>> +                        if (kvm_cpu_has_extint(vcpu))
> >>>> +                                kvm_x86_ops->enable_irq_window(vcpu);
> >>>> +                } else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >>>>                          kvm_x86_ops->enable_irq_window(vcpu);
> >>>>  
> >>>>                  if (kvm_lapic_enabled(vcpu)) {
> >>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> >>>> index 166c450..898aa62 100644
> >>>> --- a/virt/kvm/ioapic.c
> >>>> +++ b/virt/kvm/ioapic.c
> >>>> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, 
> >>>> int
> >>> irq)
> >>>>                  /* need to read apic_id from apic regiest since         
> >>>>          * it can be
> >>>>  rewritten */            irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
> >>>>  +               kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 
> >>>> 1, 1);  }
> >>>>  #endif  return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
> >>>> --
> >>>> 1.7.1
> >>> 
> >>> --
> >>>                   Gleb.
> >> 
> >> 
> >> Best regards,
> >> Yang
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to [email protected]
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> >                     Gleb.
> 
> 
> Best regards,
> Yang
> 

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to