On Thu, Dec 27, 2012 at 02:24:04AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-21:
> > On Fri, Dec 21, 2012 at 09:39:20AM -0200, Marcelo Tosatti wrote:
> >> On Fri, Dec 21, 2012 at 09:51:40AM +0200, Gleb Natapov wrote:
> >>> On Thu, Dec 20, 2012 at 08:59:11PM -0200, Marcelo Tosatti wrote:
> >>>> On Mon, Dec 17, 2012 at 01:30:49PM +0800, Yang Zhang wrote:
> >>>>> From: Yang Zhang <[email protected]>
> >>>>>
> >>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>>> manually, which is fully taken care of by the hardware. This needs
> >>>>> some special awareness into existing interrupr injection path:
> >>>>>
> >>>>> - for pending interrupt, instead of direct injection, we may need
> >>>>> update architecture specific indicators before resuming to guest.
> >>>>> - A pending interrupt, which is masked by ISR, should be also
> >>>>> considered in above update action, since hardware will decide
> >>>>> when to inject it at right time. Current has_interrupt and
> >>>>> get_interrupt only returns a valid vector from injection p.o.v.
> >>>>> Signed-off-by: Kevin Tian <[email protected]>
> >>>>> Signed-off-by: Yang Zhang <[email protected]>
> >>>>> ---
> >>>>> arch/ia64/kvm/lapic.h | 6 ++
> >>>>> arch/x86/include/asm/kvm_host.h | 6 ++
> >>>>> arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c
> >>>>> | 56 +++++++++++++- arch/x86/kvm/lapic.c | 65
> >>>>> ++++++++++------- arch/x86/kvm/lapic.h | 28 ++++++-
> >>>>> arch/x86/kvm/svm.c | 24 ++++++ arch/x86/kvm/vmx.c
> >>>>> | 154 ++++++++++++++++++++++++++++++++++++++-
> >>>>> arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h
> >>>>> | 2 + virt/kvm/ioapic.c | 36 +++++++++
> >>>>> virt/kvm/ioapic.h | 1 + virt/kvm/irq_comm.c
> >>>>> | 20 +++++ 13 files changed, 379 insertions(+), 41
> >>>>> deletions(-)
> >>>>> diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
> >>>>> index c5f92a9..cb59eb4 100644
> >>>>> --- a/arch/ia64/kvm/lapic.h
> >>>>> +++ b/arch/ia64/kvm/lapic.h
> >>>>> @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
> > kvm_lapic_irq *irq);
> >>>>> #define kvm_apic_present(x) (true)
> >>>>> #define kvm_lapic_enabled(x) (true)
> >>>>> +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
> >>>>> + struct kvm_lapic_irq *irq)
> >>>>> +{
> >>>>> + /* IA64 has no apicv supporting, do nothing here */
> >>>>> +}
> >>>>> +
> >>>>> #endif
> >>>>> diff --git a/arch/x86/include/asm/kvm_host.h
> >>>>> b/arch/x86/include/asm/kvm_host.h index c431b33..b63a144 100644 ---
> >>>>> a/arch/x86/include/asm/kvm_host.h +++
> >>>>> b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,11 @@ struct
> >>>>> kvm_x86_ops {
> >>>>> void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> >>>>> void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> >>>>> void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr,
> >>>>> int irr);
> >>>>> + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> >>>>> + void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
> >>>>> + void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq
> >>>>> *irq); + void (*reset_eoi_exitmap)(struct kvm_vcpu *vcpu); +
> >>>>> void
> >>>>> (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
> >>>>> int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> >>>>> int (*get_tdp_level)(void);
> >>>>> u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool
> > is_mmio);
> >>>>
> >>>> EOI exit bitmap is problematic (its racy). Please do this:
> >>>>
> >>>> 1. Make a synchronous (1) KVM_REQ_EOIBITMAP request on IOAPIC
> >>>> register modifications which require EOI exit bitmap updates. 2. On
> >>>> VM-entry, during KVM_REQ_EOIBITMAP processing, each checks IOAPIC map
> >>>> and adjusts its own EOI exit bitmap VMCS registers.
> >>>>
> >>>> 1) that waits until remote executing VCPUs have acknowledge the request,
> >>>> using make_all_cpus_request (see virt/kvm/kvm_main.c), similarly to
> >>>> remote TLB flushes.
> >>>>
> >>>> What is the problem now: there is no control over _when_ a VCPU
> >>>> updates its EOI exit bitmap VMCS register from the (remotely updated)
> >>>> master EOI exit bitmap. The VCPU can be processing a
> >>>> KVM_REQ_EOIBITMAP relative to a precedence IOAPIC register write
> >>>> while the current IOAPIC register write is updating the EOI exit
> >>>> bitmap. There is no way to fix that without locking (which can be
> >>>> avoided if the IOAPIC->EOI exit bitmap synchronization is vcpu local).
> >>>>
> >>> The race is benign. We have similar one for interrupt injection and
> >>> the same race exists on a real HW. The two cases that can happen due
> >>> to the race are:
> >>>
> >>> 1. exitbitmap bit X is changed from 1 to 0
> >>> No problem. It is harmless to do an exit, on the next entry
> >>> exitbitmap will be fixed. 2. exitbitmap bit X is changed from 0 to 1
> >>> If vcpu serves X at the time this happens it was delivered as edge,
> >>> so no need to exit. The exitbitmap will be updated after the next
> >>> vmexit which will happen due to KVM_REQ_EOIBITMAP processing.
> >>
> >> 1. Missed the case where bitmap is being reset (where EOI exit bitmaps
> >> are zeroed). In this case vcpu enters guest with incorrect EOI exit
> >> bitmap.
> >>
> > Right, the bitmap reset us problematic indeed since it does not
> > represent real vector configuration change.
> >
> >> 2. Missed the case where current code allows vcpu to enter guest
> >> with EOI exit bitmap unsynchronized relative to IOAPIC registers
> >> (see one KVM_REQ made at a time, no IPI sent). In that case interrupt
> >> can be delivered.
> >>
> > Ugh, I was sure there is a kick there. Missing kick is just a bug of
> > course.
> Do you mean add a kick when making KVM_REQ_EOIBITMAP request?
>
Of course, otherwise vcpu is running with stale bitmap.
> >> Thus the suggestions to update bitmap locally, on entry. Do you
> >> see any disadvantage?
> >>
> > Only one. The recalculation logic is such that given a vector it
> > calculates set of vcpus, so each vcpu will do this calculation for each
> > vector and see if it is in the set instead of recalculating once. But
> > this should be rare enough for us to not care.
> >
> >> Other than that, there is a window between IOAPIC map update and
> >> EOI bitmap request, where an interrupt can be delivered without
> >> EOI bitmap being updated (which i think local updates don't cover,
> >> either).
> > Interrupt cannot be delivered through IOAPIC while bitmap is updated
> > since IOAPIC has a lock.
> >
> >>
> >>> But software really should take care of not changing interrupt vector
> >>> configuration while there is an interrupt in flight with the same vector.
> >>
> >> None of these are guest faults. As soon as interrupts are allowed they
> >> must be handled properly (including synchronized EOI bitmap etc).
> >>
> > All my cases are guest faults and the guest will get in trouble on real
> > HW too with such behaviour. The case where we clear bitmap before
> > recalculation is not a guest fault though and have to be dealt with
> > somehow either with locks or your suggestion.
> How about to set all bits in eoi bitmap before recalculation. As you said
> it's harmless to do an vmexit. And eoibitmap will be updated on next vmentry.
>
If you set all bits before recalculation they will remain all set after
recalculation too. I do not get what you propose here.
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html