On Fri, Oct 17, 2008 at 08:12:00PM +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Sheng Yang wrote:
> >> On Fri, Oct 17, 2008 at 07:35:01PM +0200, Jan Kiszka wrote:
> >>> Sheng Yang wrote:
> >>>> On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
> >>>>> Sheng Yang wrote:
> >>>>>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
> >>>>>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
> >>>>>>> this patch relaxes the conditions under which PIC IRQs are accepted
> >>>>>>> by LVT0. This reflects reality and allows to reuse the service for the
> >>>>>>> NMI watchdog use case.
> >>>>>>>
> >>>>>>> Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
> >>>>>>> ---
> >>>>>>>  arch/x86/kvm/lapic.c |   13 ++++---------
> >>>>>>>  1 file changed, 4 insertions(+), 9 deletions(-)
> >>>>>>>
> >>>>>>> Index: b/arch/x86/kvm/lapic.c
> >>>>>>> ===================================================================
> >>>>>>> --- a/arch/x86/kvm/lapic.c
> >>>>>>> +++ b/arch/x86/kvm/lapic.c
> >>>>>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
> >>>>>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> >>>>>>>  {
> >>>>>>>       u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> >>>>>>> -     int r = 0;
> >>>>>>>
> >>>>>>> -     if (vcpu->vcpu_id == 0) {
> >>>>>>> -             if (!apic_hw_enabled(vcpu->arch.apic))
> >>>>>>> -                     r = 1;
> >>>>>>> -             if ((lvt0 & APIC_LVT_MASKED) == 0 &&
> >>>>>>> -                 GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
> >>>>>>> -                     r = 1;
> >>>>>>> -     }
> >>>>>>> -     return r;
> >>>>>>> +     if (!apic_hw_enabled(vcpu->arch.apic) ||
> >>>>>>> +         (lvt0 & APIC_LVT_MASKED) == 0)
> >>>>>>> +             return 1;
> >>>>>>> +     return 0;
> >>>>>>>  }
> >>>>>>>
> >>>>>>>  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
> >>>>>>>
> >>>>>> (sorry for late review...)
> >>>>>>
> >>>>>> Thanks to find out the root cause of BSOD!
> >>>>>>
> >>>>>> But I am a little concern about this change. As you know, PIC only 
> >>>>>> connect to 
> >>>>>> cpu0. So I think it's not proper to make it generic. 
> >>>>> I don't think so - and if it were true, qemu would have a bug then, see
> >>>>> its corresponding code.
> >>>> You can refer to Intel MP spec, virtual wire mode. Google
> >>>> "MP spec" can find it.
> >>> Ah, good reference.
> >>>
> >>>> Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford 
> >>>> SMP,
> >>>> otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
> >>>> And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, 
> >>>> so
> >>>> that's why you see
> >>>>
> >>>>     GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
> >>>>
> >>>> KVM follow virtual wire mode exactly.
> >>> According to my understanding of the spec, the virtual wire mode means
> >>> that the PIC signal is delivered via LVT0, and thus can be received by
> >>> _all_ CPUs in the system. However, only the BSP usually enables LVT0,
> >>> thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
> >>> it also unmasks the other CPUs (and reprograms all to deliver NMIs
> >>> instead of EXTINTs).
> >>>
> >>> Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
> >>> only the latter. That mode is obviously target by the current
> >>> kvm_apic_accept_pic_intr implementation. But I find no indication in the
> >>> spec yet that both modes cannot exists in the same system. But I also
> >>> fail to understand how one could switch between both modes (via software).
> >> No. If so, we don't need to check LVT0.
> > 
> > OK, there is that IMCR register to enable/disable the PIC Mode - but
> > neither KVM nor QEMU implement it (which may indicate they both want to
> > provide Virtual Wire Mode?). When enabled, Virtual Wire Mode is
> > effectively disabled (for the BSP at least) as the LAPIC is disconnected
> > (from the BSP).
> > 
> > Still, I find not trace in the spec that says only the BSP can receive
> > PIC interrupts in Virtual Wire Mode (the LVT0 line is connected to _all_
> > CPUs).
> 
> Now I checked also the BIOS KVM is shipping, and the MP Feature byte 2,
> bit 7 (IMCRP) is cleared, thus KVM is providing the Virtual Wire mode.
> Looking at Figure 3-3 of the MP spec, one can see that the PIC's output
> is connected to the LVT0 line in this mode, and that this line is
> connected to all CPUs in the system. So I can't help concluding that a)
> QEMU's implementation is correct and b) my patch is correct as well. Or
> please tell me where I'm wrong now...

Frankly speaking, here are two apporoaches. Both are OK to work. You
insisted the QEmu method, emulate that line connect all lapic's LVT0. And I
insisted to follow the current solution, the dot-line of virtual wire mode
in the spec, then make NMI watchdog as a separate thing, impact others as
small as possible.

When I wrote NMI watchdog, I don't want to involve PIC, for it's special
case of PIC usage. So I think it's OK to not emulate the path here, then use
apic_local_deliver() to send the interrupt directly, not through the PIC
path. If PIC involved, that's another path. Current QEmu covered this,
pic_request_irq() send to every vcpu, emulate that whole LVT0 line. Our KVM
choose a different way, we just assume PIC only connect to LVT0 of BSP, for
others should be disabled. That's save a lot when you have a lot of vcpus,
as you said.

So currently, QEmu emulate virtual wire mode well, and KVM do some
simplification, only connect to BSP. Both of them follow this in each's
code. And for KVM, the change to kvm_apic_accept_pic_intr() broke this
assumption. Now we only work PIC with BSP, but check all the vcpus. I don't
think that's a good combination. I think we are not likely do more to
improve our PIC connection method, so NMI watchdog in KVM was designed as a
separate thing, as a special case, and should be the only special case.

kvm_cpu_has_interrupt() called every time before VM entry to check if
there are any intr can be injected. If lapic got none,
it would check kvm_apic_accept_pic_intr(). Check every vcpu or only check
vcpu0, would bring about (vcpu_nr - 1) * ((vm_exit_nr - lapic_has_intr_nr) /
vcpu_nr)(if we assume vmexit on every vcpu is the mostly compatiable) more
times to do the judgment on other vcpus here. And normally, the latter
number would tens of thousand to hundreds of thousands. If you care about
1000 per vcpu's touch in pit, why you don't care about them here?

--
regards
Yang, Sheng
> 
> Jan
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to