Sheng Yang wrote: > On Fri, Oct 17, 2008 at 08:12:00PM +0200, Jan Kiszka wrote: >> Now I checked also the BIOS KVM is shipping, and the MP Feature byte 2, >> bit 7 (IMCRP) is cleared, thus KVM is providing the Virtual Wire mode. >> Looking at Figure 3-3 of the MP spec, one can see that the PIC's output >> is connected to the LVT0 line in this mode, and that this line is >> connected to all CPUs in the system. So I can't help concluding that a) >> QEMU's implementation is correct and b) my patch is correct as well. Or >> please tell me where I'm wrong now... > > Frankly speaking, here are two apporoaches. Both are OK to work. You > insisted the QEmu method, emulate that line connect all lapic's LVT0. And I > insisted to follow the current solution, the dot-line of virtual wire mode > in the spec, then make NMI watchdog as a separate thing, impact others as > small as possible.
Ack.
>
> When I wrote NMI watchdog, I don't want to involve PIC, for it's special
> case of PIC usage. So I think it's OK to not emulate the path here, then use
> apic_local_deliver() to send the interrupt directly, not through the PIC
> path. If PIC involved, that's another path. Current QEmu covered this,
> pic_request_irq() send to every vcpu, emulate that whole LVT0 line. Our KVM
> choose a different way, we just assume PIC only connect to LVT0 of BSP, for
> others should be disabled. That's save a lot when you have a lot of vcpus,
> as you said.
Yes, I came across this assumption that only the BSP can receive PIC
interrupts as well in the meantime. I tried to first enhance the
accuracy of KVMs virtual wire mode and then optimize it the way proposed
for the NMI watchdog. However, I had to give up as I realized the this
assumption is too deeply hooked into the KVM design.
Nevertheless, one minor inaccuracy can and should be fixed (will repost
as true patch after more testing): If the APIC is disabled, there will
be no PIC interrupt forwarding. This should also be fixed in QEMU.
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1071,17 +1071,15 @@ int kvm_apic_has_interrupt(struct kvm_vc
int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
{
+ struct kvm_lapic *apic = vcpu->arch.apic;
u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
- int r = 0;
- if (vcpu->vcpu_id == 0) {
- if (!apic_hw_enabled(vcpu->arch.apic))
- r = 1;
- if ((lvt0 & APIC_LVT_MASKED) == 0 &&
- GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
- r = 1;
- }
- return r;
+ /* Virtual Wire mode, but we only deliver to the BSP. */
+ if (vcpu->vcpu_id == 0 && apic_hw_enabled(apic)
+ && !(lvt0 & APIC_LVT_MASKED)
+ && GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+ return 1;
+ return 0;
}
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>
> So currently, QEmu emulate virtual wire mode well, and KVM do some
> simplification, only connect to BSP. Both of them follow this in each's
> code. And for KVM, the change to kvm_apic_accept_pic_intr() broke this
> assumption. Now we only work PIC with BSP, but check all the vcpus. I don't
> think that's a good combination. I think we are not likely do more to
> improve our PIC connection method, so NMI watchdog in KVM was designed as a
> separate thing, as a special case, and should be the only special case.
Agreed. I'm preparing patches to take this into account while fixing the
current NMI watchdog implementation.
>
> kvm_cpu_has_interrupt() called every time before VM entry to check if
> there are any intr can be injected. If lapic got none,
> it would check kvm_apic_accept_pic_intr(). Check every vcpu or only check
> vcpu0, would bring about (vcpu_nr - 1) * ((vm_exit_nr - lapic_has_intr_nr) /
> vcpu_nr)(if we assume vmexit on every vcpu is the mostly compatiable) more
> times to do the judgment on other vcpus here. And normally, the latter
> number would tens of thousand to hundreds of thousands. If you care about
> 1000 per vcpu's touch in pit, why you don't care about them here?
As I said, that case would have only mattered in an improved version if
any VCPU > 1 had its LVT0 unmasked - similar optimization like for NMI
WD. But things are more tricky as the PIC code and its users are not
prepared to dispatch the PIC vector to multiple sinks. That finally
convinced me stopping my rework. The effort became too high compared to
the accuracy gain that hardly any OS may need.
Jan
signature.asc
Description: OpenPGP digital signature
