Re: [kvm-devel] [PATCH 0/9] in-kernel APIC v9 (kernel side)

Gregory Haskins Tue, 12 Jun 2007 16:56:26 -0700

On Mon, 2007-06-11 at 22:41 +0800, Dong, Eddie wrote:
> Gregory Haskins wrote:
> > Hi Eddie,
> >  Back from vacation. Just catching up on email now....
> > 
> > On Thu, 2007-06-07 at 16:20 +0800, Dong, Eddie wrote:
> >> 
> >> Greg:
> >>    Here are some detail comments towarding the LAPIC device model.
> 1:
> >>    irqabstraction layer vcpu->irq.pending holds the abstract
> processor
> >> interrupt request (called localint, extint, nmi etc in V09).  But
> >> API kvm_vcpu_intr only set the interrupt request, no clear.
> > 
> > By design
> > 
> >>   So far I only noticed when an
> >> interrupt request is pop or injected,  vcpu->irq.pending bit may be
> >> cleared. But there is case when guest raise TPR bar, an localint
> >> could be cleared.
> > 
> > Yep, by design.  And it will be set again when the TPR bar is lowered.
> 
> Mmm, since the vcpu->irq.pending is not cleared, now KVM will think
> there 
> is an irq to inject before the TPR is lowered.


This is incorrect.  If an interrupt is pending but masked by PPR,
irq.pending is cleared by the !(ack.flags & VECTOR_PENDING) check and no
event is injected to the guest. Later when PPR is modified to unmask the
vector, we re-raise the irq.pending bit.  Worst case scenario is we
perform a single superfluous ack cycle which reveals there are no
eligible vectors to inject.  No harm, no foul.  We clear the irq.pending
and move on. As far as the emulation is concerned, proper behavior is
obtained. 

This methodology for clearing irq.pending was implemented early on in
the review process to make the clearing atomic with the acknowledgment.

> 
> 
> > 
> >>  Same for extint when PIC IMR is masked.
> > 
> > Yeah, we have a hole here.  If IMR is changed to mask a pending
> > vector, it is conceivable that it could remain pending in the kernel.
> > Note that
> > this hole was always in KVM, even before my patch.  Because of
> 
> No, in original KVM, this kind of hole doesn't exist since all the PIC 
> logic are in Qemu. Each time if Qemu want to do KVM_INTERRUPT,
> Qemu will check the eflag.if, interrupt window etc. I.e. each
> KVM_INTERRUPT
> injected IRQ will be injected to guest immediately.

I agree that the current code *masks* the issue such that it is not an
actual problem today, yes.  However, I disagree that there isn't a hole
there.  ;)

The userspace code is predicated on the lock-step nature of the
user/kernel interaction.  It assumed that the vector would be injected
on the next KVM_RUN and therefore there was nothing that would change
its eligibility for injecting (and thus it was not possible for IMR to
change out from under it.  As evident by my patch, immediate injection
may not always be true in the future. ;) This is one of the issues which
should be fixed, IMHO.

In addition, I think we need to go back to the synchronous "try_to_push"
model that I had originally (before v9) so that the state of the vcpu is
properly considered before we ack the 8259.

Between these two things, I think it will work as expected.

> 
> > this I am
> > of the opinion that it doesn't need to be fixed as part of the LAPIC
> > work per se before merging.  However, that being said, we can lay the
> > groundwork to support this now by adding an "int level" to the
> > structure that is passed in the KVM_ISA_INTERRUPT ioctl.  Going
> > forward, someone can patch the QEMU::i8259 to send clear events when
> > IMR changes. 
> 
> No matter a new API or extend previous API, anyway the kernel interrupt
> IRR should
> be able to be set and cleared  by Qemu :-)
> 
> > 
> >> 
> >>    In Xen, since there is no notion of abstract processor interrupt
> >> request, VMM check interrupt directly from PIC and APIC, so Xen
> >> doesn't have problem. I guess Windows will fail here.
> > 
> > No, this is incorrect (and Windows boots fine even w/ACPI
> > enabled, BTW).
> > For the LAPIC, we do check the LAPIC model directly (and therefore
> > consider TPR, etc).  The abstract interrupt mechanism simply tells the
> > vcpu that it needs to check the LAPIC.  It is not authoritative in
> > deciding if something actually gets injected.
> > 
> > For the PIC, we don't check directly (the PICs vectors are
> > cached in the
> > kernel), but again note that this is how KVM has always
> > worked.  I think
> > fixing the KVM_ISA_INTERRUPT::level mechanism + adding
> > IMR-clear support
> > shores up any issue there, however.
> > 
> >> 
> >>    2: APIC timer
> >>    a: V09 uses hrtimer for LAPIC timer, apic->timer.last_update is
> >> updated every time when __apic_timer_fn is invoked at time of the
> >> APIC timer fired. This impose an accumulated difference since the
> >> fire time is already some ns later after expected time.
> >>    Xen solve this issue by increase apic->timer.last_update with
> >> the PERIOD, i.e. APIC_BUS_CYCLE_NS * apic->timer.divide_count *
> >>    APIC_TMICT. b: Seems current approach starts hrtimer whenever
> >> APIC_TMICT is updated. Should we check APIC_LVT to see if it is
> >> masked here? (instead of doing in its callback
> >> function:__apic_timer_fn). Also why APIC_TMCCT is updated here? I
> >>    think TMCCT is reloaded only when it reaches 0 and LVTT works in
> >> periodic mode. c: I didn't see LVTT mask status refelect the hrtimer 
> >> cancel/start, do I miss something?
> > 
> > I inherited most of lapic.c from Dor, and I believe he
> > inherited most of
> > it from an older version of Xen.  While I have come to understand much
> > of the inner workings of the LAPIC during the course of developing
> > this patch, the timer is still a relative enigma to me.  Therefore, I
> > do not have any comment as to the reasons why something was done here
> > the way it was, nor to the validity of the problems you are
> > highlighting. Perhaps Dor will know. 
> > 
> > But that being said, patches against v09 to fix problems you see are
> > always welcome. 
> > 
> > 
> >> 
> >>    3:  Assume a senario there is an valid IDT_VECTORING_INFO_FIELD,
> >> following code(after patch) in handle_exception push back the failed
> >> interrupt vector, i.e. vcpu->irq.deferred.
> >> 
> >> ........
> >>         if (is_external_interrupt(vect_info)) {
> >>                 /*
> >>                  * An exception was taken while we were trying to
> >> inject an 
> >>                  * IRQ.  We must defer the injection of the vector
> >> until 
> >>                  * the next window.
> >>                  */
> >>                 int irq = vect_info & VECTORING_INFO_VECTOR_MASK;
> >>                 kvm_vcpu_irq_push(vcpu, irq);
> >>         }
> >> 
> >> 
> >>    a:  Now, the abstracted processor interrupt, i.e.
> >> vcpu->irq.pending, could be 0, __kvm_vcpu_irq_pending invoked at
> >> beginning of do_interrupt_requests will set kvm_irqpin_localint in
> >> the abstracted processor interrupt (vcpu->irq.pending). But if at
> >> same time, we get an external IRQ, i.e. vcpu->irq.pending is set
> >>    with both localint & extint. From following code ,
> >> kvm_irqpin_extint has higher priority than kvm_irqpin_localint.
> > 
> > Yes, I understand this scenario.  It is the same problem I was
> > trying to
> > describe earlier when I said extint/nmi can inadvertently get
> > prioritized over deferred.  I will fix this.
> > 
> >> 
> >>         while (pending) {
> >>                 kvm_irqpin_t pin = __fls(pending);
> >> 
> >>                 switch (pin) {
> >>                 case kvm_irqpin_localint:
> >>                 case kvm_irqpin_extint:
> >>                 case kvm_irqpin_nmi:
> >>                         do_intr_requests(vcpu, kvm_run, pin);
> >>                         break;
> >>            ..............
> >> 
> >>    Now in do_intr_requests, we get an extirq vector instead of
> >> vcpu->irq.deferred from following code since pin=kvm_irqpin_extint.
> >> That means we inject an new irq instead of original failed irq in
> >> IDT_VECTORING_INFO_FIELD. 
> >> 
> >>            ........
> >>                 switch (pin) {
> >>                 case kvm_irqpin_localint:
> >>                         r = kvm_vcpu_irq_pop(vcpu, &ack);
> >>                         break;
> >>                 case kvm_irqpin_extint:
> >>                         r = kvm_irqdevice_ack(&vcpu->kvm->isa_irq,
> >>                         0, &ack); if (!(ack.flags &
> >> KVM_IRQACKDATA_VECTOR_PENDING))
> >>                                 __clear_bit(pin, &vcpu->irq.pending);
> >>                         break;
> >>                 case kvm_irqpin_nmi:
> >> 
> >> 
> >>    Anyway due to SMP & in-kernel APIC, I'd like to suggest we move
> >> IDT_VECTORING_INFO_FIELD to do_interrupt_requests like
> >> vmx_intr_assist in Xen where physical IRQ is disabled.
> > 
> > I haven't looked at the new Xen code, but I will try to take a peek.
> > Its probably moot since I am confident that the fix I am suggesting
> > allows the deferred mechanism to work the way I intended, but I will
> > keep an open mind to alternative solutions.
> 
> The Xen side doesn't support NMI yet though the patch for NMI is in hand
> now :-(
> Hopefully it will be out this week :-)
> 
> > 
> >> 
> >> 
> >> 
> >> thx, eddie


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 0/9] in-kernel APIC v9 (kernel side)

Reply via email to