Gleb Natapov wrote on 2012-12-05:
> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-04:
>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2012-12-03:
>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>>> some special awareness into existing interrupr injection path:
>>>>>>
>>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>> update architecture specific indicators before resuming to guest. -
>>>>>> A pending interrupt, which is masked by ISR, should be also
>>>>>> considered in above update action, since hardware will decide when
>>>>>> to inject it at right time. Current has_interrupt and get_interrupt
>>>>>> only returns a valid vector from injection p.o.v.
>>>>> Most of my previous comments still apply.
>>>>>
>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> + int trig_mode, int always_set)
>>>>>> +{
>>>>>> + if (kvm_x86_ops->set_eoi_exitmap)
>>>>>> + kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>> + trig_mode, always_set);
>>>>>> +}
>>>>>> +
>>>>>> /*
>>>>>> * Add a pending IRQ into lapic.
>>>>>> * Return 1 if successfully added and 0 if discarded.
>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
> *apic,
>>> int
>>>>> delivery_mode,
>>>>>> if (unlikely(!apic_enabled(apic)))
>>>>>> break;
>>>>>> + kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
>>>>> notifier configuration changes, user request bit to notify vcpus to
>>>>> reload the bitmap.
>>>> It is too complicated. When program ioapic entry, we cannot get the target
> vcpu
>>> easily. We need to read destination format register and logical destination
>>> register to find out target vcpu if using logical mode. Also, we must trap
>>> every
>>> modification to the two registers to update eoi bitmap.
>>> No need to check target vcpu. Enable exit on all vcpus for the vector
>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
> ensure all vectors have same trigger mode. And what's worse, the vector in
> another vcpu is used to handle high frequency interrupts(like 10G NIC), then
> it
> will hurt performance.
>>
> I never saw OSes reuse vector used by ioapic, as far as I see this
Could you point out which code does this check in Linux kernel? I don't see any
special checks when Linux kernel allocates a vector.
> is not how Linux code works. Furthermore it will not work with KVM
> currently since apic eoi redirected to ioapic based on vector alone,
> not vector/vcpu pair and as far as I am aware this is how real HW works.
yes, real HW works in this way. But why it is helpful in this case?
>>> programmed into ioapic. Which two registers? All accesses to ioapic are
>>> trapped and reconfiguration is rare.
>> In logical mode, the destination VCPU is depend on each CPU's destination
> format register and logical destination register. So we must also trap the two
> registers.
>> And if it uses lowest priority delivery mode, the PPR need to be trapped too.
> Since PPR will change on each interrupt injection, the cost should be higher
> than
> current approach.
> No need for all of that if bitmask it global.
No, the bitmask is per VCPU. Also, why it will work if bitmask is global?
>>
>>>> For irq notifier, only PIT is special which is edge trigger but need an
>>>> EOI notifier. So, just treat it specially. And TMR can cover others.
>>>>
>>> We shouldn't assume that. If another notifier will be added it will be
>>> easy to forget to update apicv code to exclude another vector too.
>> At this point, guest is not running(in device initializing), we cannot not
>> know the
> vector. As you mentioned, the best point is when guest program ioapic entry.
> But
> it also is impossible to get the vector(see above).
>> I can give some comments on the function to remind the caller to update
>> eoi bitmap when the interrupt is edge and they still want to get EOI
>> vmexit.
>>
>>>>>
>>>>>> if (trig_mode) {
>>>>>> apic_debug("level trig mode for vector %d",
>>>>>> vector);
>>>>>> apic_set_vector(vector, apic->regs + APIC_TMR);
>>>>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu
>>> *vcpu1,
>>>>> struct kvm_vcpu *vcpu2)
>>>>>> return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>>>>> }
>>>>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int
>>>>>> vector) +{ + if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
>>>>>> APIC_SPIV_DIRECTED_EOI) && +
>>>>>> kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { + int
>>>>>> trigger_mode; + if (apic_test_vector(vector, apic->regs +
>>>>>> APIC_TMR)) + trigger_mode = IOAPIC_LEVEL_TRIG; +
>>>>>> else +
>>>>>> trigger_mode = IOAPIC_EDGE_TRIG;
>>>>>> + kvm_ioapic_update_eoi(apic->vcpu->kvm, vector,
>>>>>> trigger_mode);
>>>>>> + } +} +
>>>>>> static int apic_set_eoi(struct kvm_lapic *apic) { int vector =
>>>>>> apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>>>>> apic_set_eoi(struct kvm_lapic *apic) apic_clear_isr(vector, apic);
>>>>>> apic_update_ppr(apic);
>>>>>> - if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
>>>>>> APIC_SPIV_DIRECTED_EOI)
>>>>>> && - kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { -
>>>>>> int trigger_mode; - if (apic_test_vector(vector,
>>>>>> apic->regs +
>>>>>> APIC_TMR)) - trigger_mode = IOAPIC_LEVEL_TRIG; -
>>>>>> else -
>>>>>> trigger_mode = IOAPIC_EDGE_TRIG;
>>>>>> - kvm_ioapic_update_eoi(apic->vcpu->kvm, vector,
>>>>>> trigger_mode);
>>>>>> - } + kvm_ioapic_send_eoi(apic, vector);
>>>>>> kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>> return vector;
>>>>>> }
>>>>>> +/*
>>>>>> + * this interface assumes a trap-like exit, which has already finished
>>>>>> + * desired side effect including vISR and vPPR update.
>>>>>> + */
>>>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>>>>>> +{
>>>>>> + struct kvm_lapic *apic = vcpu->arch.apic;
>>>>>> +
>>>>> trace_kvm_eoi()
>>>> Ok.
>>>>
>>>>>> + kvm_ioapic_send_eoi(apic, vector);
>>>>>> + kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>>>>>> +
>>>>>> static void apic_send_ipi(struct kvm_lapic *apic) { u32 icr_low =
>>>>>> kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>>>>> kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) return highest_irr; }
>>>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + struct kvm_lapic *apic = vcpu->arch.apic;
>>>>>> +
>>>>>> + if (!apic || !apic_enabled(apic))
>>>>> Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.
>>>> Ok.
>>>>
>>>>>
>>>>>> + return -1;
>>>>>> +
>>>>>> + return apic_find_highest_irr(apic);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
>>>>>> +
>>>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>>>>>> {
>>>>>> u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>>>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>>>>>> index c42f111..749661a 100644
>>>>>> --- a/arch/x86/kvm/lapic.h
>>>>>> +++ b/arch/x86/kvm/lapic.h
>>>>>> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>>>>>> int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>>>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>>>>>> int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
>>>>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
>>>>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
>>>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>>>>>> void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64
>>>>>> kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void
>>>>>> kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@
>>>>>> -50,6 +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu);
>>>>>> int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16
>>>>>> dest); int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8
>>>>>> mda); int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
>>>>>> kvm_lapic_irq *irq);
>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> + int need_eoi, int global);
>>>>>> int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>>>>>>
>>>>>> bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
>>> *src,
>>>>>> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct
> kvm_vcpu
>>>>> *vcpu);
>>>>>> void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64
> data);
>>>>>>
>>>>>> int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
>>>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>>>>>>
>>>>>> void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t
>>>>>> vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>>>> index dcb7952..8f0903b 100644
>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct
>>> kvm_vcpu
>>>>> *vcpu, int tpr, int irr)
>>>>>> set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>>>>>> }
>>>>>> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static void svm_update_irq(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + return ;
>>>>>> +}
>>>>>> +
>>>>>> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> + int trig_mode, int always_set)
>>>>>> +{
>>>>>> + return ;
>>>>>> +}
>>>>>> +
>>>>>> static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { struct
>>>>>> vcpu_svm *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct
>>>>>> kvm_x86_ops svm_x86_ops = { .enable_nmi_window =
>>>>>> enable_nmi_window, .enable_irq_window = enable_irq_window,
>>>>>> .update_cr8_intercept = update_cr8_intercept,
>>>>>> + .has_virtual_interrupt_delivery =
>>>>>> svm_has_virtual_interrupt_delivery,
>>>>>> + .update_irq = svm_update_irq;
>>>>>> + .set_eoi_exitmap = svm_set_eoi_exitmap;
>>>>>>
>>>>>> .set_tss_addr = svm_set_tss_addr,
>>>>>> .get_tdp_level = get_npt_level,
>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>> index 6a5f651..909ce90 100644
>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
>>>>>> static bool __read_mostly enable_apicv_reg;
>>>>>> module_param(enable_apicv_reg, bool, S_IRUGO);
>>>>>> +static bool __read_mostly enable_apicv_vid;
>>>>>> +module_param(enable_apicv_vid, bool, S_IRUGO);
>>>>>> +
>>>>>> /*
>>>>>> * If nested=1, nested virtualization is supported, i.e., guests may use
>>>>>> * VMX and be a hypervisor for its own guests. If nested=0, guests may
>>> not
>>>>>> @@ -432,6 +435,9 @@ struct vcpu_vmx {
>>>>>>
>>>>>> bool rdtscp_enabled;
>>>>>> + u8 eoi_exitmap_changed;
>>>>>> + u32 eoi_exit_bitmap[8];
>>>>>> +
>>>>>> /* Support for a guest hypervisor (nested VMX) */
>>>>>> struct nested_vmx nested;
>>>>>> };
>>>>>> @@ -770,6 +776,12 @@ static inline bool
>>>>> cpu_has_vmx_apic_register_virt(void)
>>>>>> SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>> }
>>>>>> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
>>>>>> +{
>>>>>> + return vmcs_config.cpu_based_2nd_exec_ctrl &
>>>>>> + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>> +}
>>>>>> +
>>>>>> static inline bool cpu_has_vmx_flexpriority(void)
>>>>>> {
>>>>>> return cpu_has_vmx_tpr_shadow() &&
>>>>>> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct
>>>>> vmcs_config *vmcs_conf)
>>>>>> SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>>>>>> SECONDARY_EXEC_RDTSCP |
>>>>>> SECONDARY_EXEC_ENABLE_INVPCID |
>>>>>> - SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>> + SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>>>> + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>> if (adjust_vmx_controls(min2, opt2,
>>>>>> MSR_IA32_VMX_PROCBASED_CTLS2,
>>>>>> &_cpu_based_2nd_exec_control) <
>>>>>> 0)
>>>>>> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct
>>>>>> vmcs_config *vmcs_conf)
>>>>>>
>>>>>> if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>>>>>> _cpu_based_2nd_exec_control &= ~(
>>>>>> - SECONDARY_EXEC_APIC_REGISTER_VIRT);
>>>>>> + SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>>>> + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>>>>>>
>>>>>> if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>>>>>> /* CR3 accesses and invlpg don't need to cause VM Exits
>>>>>> when EPT
>>>>>> @@ -2724,6 +2738,14 @@ static __init int hardware_setup(void) if
>>>>>> (!cpu_has_vmx_apic_register_virt()) enable_apicv_reg = 0;
>>>>>> + if (!cpu_has_vmx_virtual_intr_delivery())
>>>>>> + enable_apicv_vid = 0;
>>>>>> +
>>>>>> + if (!enable_apicv_vid) {
>>>>>> + kvm_x86_ops->update_irq = NULL;
>>>>> Why setting it to NULL? Either drop this since vmx_update_irq() checks
>>>>> enable_apicv_vid or better set it to function that does nothing and
>>>>> drop enable_apicv_vid check in vmx_update_irq(). Since
>>>>> kvm_x86_ops->update_irq will never be NULL you can drop the check
>>>>> before calling it.
>>>> Sure.
>>>>
>>>>>> + kvm_x86_ops->update_cr8_intercept = NULL;
>>>>> Why? It should be other way around: if apicv is enabled set
>>>>> update_cr8_intercept callback to NULL.
>>>> Yes, this is wrong.
>>> Please test the patches with vid disabled and Windows guests. This bug
>>> should have prevented it from working.
>>>
>>>>
>>>>>> + }
>>>>>> +
>>>>>> if (nested)
>>>>>> nested_vmx_setup_ctls_msrs();
>>>>>> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct
>>>>> vcpu_vmx *vmx)
>>>>>> exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>>>>>> if (!enable_apicv_reg)
>>>>>> exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>> + if (!enable_apicv_vid)
>>>>>> + exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>> return exec_control;
>>>>>> }
>>>>>> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx
>>> *vmx)
>>>>>> vmx_secondary_exec_control(vmx));
>>>>>> }
>>>>>> + if (enable_apicv_vid) {
>>>>>> + vmcs_write64(EOI_EXIT_BITMAP0, 0);
>>>>>> + vmcs_write64(EOI_EXIT_BITMAP1, 0);
>>>>>> + vmcs_write64(EOI_EXIT_BITMAP2, 0);
>>>>>> + vmcs_write64(EOI_EXIT_BITMAP3, 0);
>>>>>> +
>>>>>> + vmcs_write16(GUEST_INTR_STATUS, 0);
>>>>>> + }
>>>>>> +
>>>>>> if (ple_gap) {
>>>>>> vmcs_write32(PLE_GAP, ple_gap);
>>>>>> vmcs_write32(PLE_WINDOW, ple_window);
>>>>>> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu
>>>>> *vcpu)
>>>>>> return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>>>>>> }
>>>>>> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + unsigned long exit_qualification =
>>>>>> vmcs_readl(EXIT_QUALIFICATION);
>>>>>> + int vector = exit_qualification & 0xff;
>>>>>> +
>>>>>> + /* EOI-induced VM exit is trap-like and thus no need to adjust
>>>>>> IP */
>>>>>> + kvm_apic_set_eoi_accelerated(vcpu, vector);
>>>>>> + return 1;
>>>>>> +}
>>>>>> +
>>>>>> static int handle_apic_write(struct kvm_vcpu *vcpu)
>>>>>> {
>>>>>> unsigned long exit_qualification =
>>>>>> vmcs_readl(EXIT_QUALIFICATION);
>>>>>> @@ -5755,6 +5798,7 @@ static int (*const
> kvm_vmx_exit_handlers[])(struct
>>>>> kvm_vcpu *vcpu) = {
>>>>>> [EXIT_REASON_TPR_BELOW_THRESHOLD] =
>>>>>> handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS]
>>>>>> = handle_apic_access, [EXIT_REASON_APIC_WRITE] =
>>>>>> handle_apic_write, + [EXIT_REASON_EOI_INDUCED] =
>>>>>> handle_apic_eoi_induced, [EXIT_REASON_WBINVD] =
>>>>>> handle_wbinvd, [EXIT_REASON_XSETBV] =
>>>>>> handle_xsetbv, [EXIT_REASON_TASK_SWITCH] =
>>>>>> handle_task_switch,
>>>>>> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu
>>>>>> *vcpu)
>>>>>>
>>>>>> static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int
>>>>>> irr)
>>>>>> {
>>>>>> + /* no need for tpr_threshold update if APIC virtual
>>>>>> + * interrupt delivery is enabled */
>>>>>> + if (!enable_apicv_vid)
>>>>>> + return ;
>>>>>> +
>>>>> Since you (will) set ->update_cr8_intercept callback to NULL if vid
>>>>> is enabled this function will never be called with !enable_apicv_vid,
>>>>> so this check can be dropped.
>>>> Ok.
>>>>
>>>>>> if (irr == -1 || tpr < irr) {
>>>>>> vmcs_write32(TPR_THRESHOLD, 0);
>>>>>> return;
>>>>>> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct
>>> kvm_vcpu
>>>>> *vcpu, int tpr, int irr)
>>>>>> vmcs_write32(TPR_THRESHOLD, irr);
>>>>>> }
>>>>>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
>>>>>> +{
>>>>>> + int tmr;
>>>>>> + tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
>>>>>> + APIC_TMR + 0x10 * index);
>>>>>> + vmcs_write32(EOI_EXIT_BITMAP0 + index,
>>>>>> + vmx->eoi_exit_bitmap[index] | tmr);
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_rvi(int vector)
>>>>>> +{
>>>>>> + u16 status;
>>>>>> + u8 old;
>>>>>> +
>>>>>> + status = vmcs_read16(GUEST_INTR_STATUS);
>>>>>> + old = (u8)status & 0xff;
>>>>>> + if ((u8)vector != old) {
>>>>>> + status &= ~0xff;
>>>>>> + status |= (u8)vector;
>>>>>> + vmcs_write16(GUEST_INTR_STATUS, status);
>>>>>> + }
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + int vector;
>>>>>> + struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>>>> +
>>>>>> + if (!enable_apicv_vid)
>>>>>> + return ;
>>>>>> +
>>>>>> + vector = kvm_apic_get_highest_irr(vcpu);
>>>>>> + if (vector == -1)
>>>>>> + return;
>>>>>> +
>>>>>> + vmx_update_rvi(vector);
>>>>>> +
>>>>>> + if (vmx->eoi_exitmap_changed) {
>>>>>> + int index;
>>>>>> + for_each_set_bit(index,
>>>>>> + (unsigned long
>>>>>> *)(&vmx->eoi_exitmap_changed), 8)
>>>>>> + vmx_update_eoi_exitmap(vmx, index);
>>>>>> + vmx->eoi_exitmap_changed = 0;
>>>>>> + }
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
>>>>>> + int vector, int trig_mode,
>>>>>> + int always_set)
>>>>>> +{
>>>>>> + struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>>>> + int index, offset, changed;
>>>>>> + struct kvm_lapic *apic;
>>>>>> +
>>>>>> + if (!enable_apicv_vid)
>>>>>> + return ;
>>>>>> +
>>>>>> + if (WARN_ONCE((vector < 0) || (vector > 255),
>>>>>> + "KVM VMX: vector (%d) out of range\n", vector))
>>>>>> + return;
>>>>>> +
>>>>>> + apic = vcpu->arch.apic;
>>>>>> + index = vector >> 5;
>>>>>> + offset = vector & 31;
>>>>>> +
>>>>>> + if (always_set)
>>>>>> + changed = !test_and_set_bit(offset,
>>>>>> + (unsigned long *)&vmx->eoi_exit_bitmap);
>>>>>> + else if (trig_mode)
>>>>>> + changed = !test_bit(offset,
>>>>>> + apic->regs + APIC_TMR + index * 0x10);
>>>>>> + else
>>>>>> + changed = test_bit(offset,
>>>>>> + apic->regs + APIC_TMR + index * 0x10);
>>>>>> +
>>>>>> + if (changed)
>>>>>> + vmx->eoi_exitmap_changed |= 1 << index;
>>>>>> +}
>>>>>> +
>>>>>> static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { u32
>>>>>> exit_intr_info; @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops
>>>>>> vmx_x86_ops = { .enable_nmi_window = enable_nmi_window,
>>>>>> .enable_irq_window = enable_irq_window,
>>>>>> .update_cr8_intercept =
>>>>>> update_cr8_intercept,
>>>>>> + .has_virtual_interrupt_delivery =
>>>>>> vmx_has_virtual_interrupt_delivery,
>>>>>> + .update_irq = vmx_update_irq,
>>>>>> + .set_eoi_exitmap = vmx_set_eoi_exitmap,
>>>>>>
>>>>>> .set_tss_addr = vmx_set_tss_addr,
>>>>>> .get_tdp_level = get_ept_level,
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
>>>>>> b0b8abe..02fe194 100644 --- a/arch/x86/kvm/x86.c +++
>>>>>> b/arch/x86/kvm/x86.c @@ -164,6 +164,14 @@ static int
>>>>>> emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>>>>>
>>>>>> static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>>>>>> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + if (kvm_x86_ops->has_virtual_interrupt_delivery)
>>>>> This callback is never NULL.
>>>> Ok.
>>>>
>>>>>> + return
>>>>>> kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>>>>>> {
>>>>>> int i;
>>>>>> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct
>>> kvm_vcpu
>>>>> *vcpu)
>>>>>> vcpu->arch.nmi_injected = true;
>>>>>> kvm_x86_ops->set_nmi(vcpu);
>>>>>> }
>>>>>> - } else if (kvm_cpu_has_interrupt(vcpu)) {
>>>>>> - if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>>>>>> - kvm_queue_interrupt(vcpu,
>>>>>> kvm_cpu_get_interrupt(vcpu),
>>>>>> - false);
>>>>>> + } else if (kvm_cpu_has_interrupt(vcpu) &&
>>>>>> + kvm_x86_ops->interrupt_allowed(vcpu)) {
>>>>>> + int vector = -1;
>>>>>> +
>>>>>> + if (kvm_apic_vid_enabled(vcpu))
>>>>>> + vector = kvm_cpu_get_extint(vcpu);
>>>>>> + else
>>>>>> + vector = kvm_cpu_get_interrupt(vcpu);
>>>>>> +
>>>>>> + if (vector != -1) {
>>>>>> + kvm_queue_interrupt(vcpu, vector, false);
>>>>>> kvm_x86_ops->set_irq(vcpu);
>>>>>> }
>>>>> If vid is enabled kvm_cpu_has_interrupt() should return true only if there
>>>>> is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
>>>>> extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
>>>>> logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
>>>>> of changing interrupt injection logic here and in vcpu_enter_guest()
>>>>> bellow. We still need kvm_cpu_has_interrupt() variant that always checks
>>>>> both extint and apic for use in kvm_arch_vcpu_runnable() though.
>>>> As you mentioned, we still need to checks both extint and apic interrupt in
>>> some case. So how to do this? Introduce another argument to indicate
>>> whether check both? Yes, we need to check both in
>>> kvm_arch_vcpu_runnable(). Another argument is good option. We can have
>>> two functions: kvm_cpu_has_injectable_interrupt() for use in irq
>>> injection path and kvm_cpu_has_interrupt() for use in
>>> kvm_arch_vcpu_runnable(). They will call common one with additional
>>> argument to avoid code duplication.
>> Ok. will follow this way.
>>
>>>>
>>>>>> +
>>>>>> }
>>>>>> }
>>>>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>>>> *vcpu)
>>>>>> }
>>>>>>
>>>>>> if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>>>> + /* update archtecture specific hints for APIC
>>>>>> + * virtual interrupt delivery */
>>>>>> + if (kvm_x86_ops->update_irq)
>>>>>> + kvm_x86_ops->update_irq(vcpu);
>>>>>> +
>>>>>
>>>>> I do not see why this have to be here instead of inside if
>>>>> (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
>>>>> lines bellow. If you move it there you can drop apic enable check in
>>>>> kvm_apic_get_highest_irr().
>>>> Yes, it seems ok to move it.
>>>>
>>>>>> inject_pending_event(vcpu);
>>>>>>
>>>>>> /* enable NMI/IRQ window open exits if needed */
>>>>>> if (vcpu->arch.nmi_pending)
>>>>>> kvm_x86_ops->enable_nmi_window(vcpu);
>>>>>> - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>>> + else if (kvm_apic_vid_enabled(vcpu)) {
>>>>>> + if (kvm_cpu_has_extint(vcpu))
>>>>>> + kvm_x86_ops->enable_irq_window(vcpu);
>>>>>> + } else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>>> kvm_x86_ops->enable_irq_window(vcpu);
>>>>>>
>>>>>> if (kvm_lapic_enabled(vcpu)) {
>>>>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>>>>> index 166c450..898aa62 100644
>>>>>> --- a/virt/kvm/ioapic.c
>>>>>> +++ b/virt/kvm/ioapic.c
>>>>>> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic,
> int
>>>>> irq)
>>>>>> /* need to read apic_id from apic regiest since
>>>>>> * it can be
>>>>>> rewritten */ irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
>>>>>> + kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector,
>>>>>> 1, 1);
>>>>>> } #endif return kvm_irq_delivery_to_apic(ioapic->kvm,
>>>>>> NULL,
>>>>>> &irqe);
>>>>>> --
>>>>>> 1.7.1
>>>>>
>>>>> --
>>>>> Gleb.
>>>>
>>>>
>>>> Best regards,
>>>> Yang
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> Gleb.
>>
>>
>> Best regards,
>> Yang
>>
>
> --
> Gleb.
Best regards,
Yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html