On Sat, Oct 18, 2008 at 10:44 AM, Sheng Yang <[EMAIL PROTECTED]> wrote:
> On Fri, Oct 17, 2008 at 08:12:00PM +0200, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>> > Sheng Yang wrote:
>> >> On Fri, Oct 17, 2008 at 07:35:01PM +0200, Jan Kiszka wrote:
>> >>> Sheng Yang wrote:
>> >>>> On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
>> >>>>> Sheng Yang wrote:
>> >>>>>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
>> >>>>>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
>> >>>>>>> this patch relaxes the conditions under which PIC IRQs are accepted
>> >>>>>>> by LVT0. This reflects reality and allows to reuse the service for
>> >>>>>>> the
>> >>>>>>> NMI watchdog use case.
>> >>>>>>>
>> >>>>>>> Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
>> >>>>>>> ---
>> >>>>>>> arch/x86/kvm/lapic.c | 13 ++++---------
>> >>>>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
>> >>>>>>>
>> >>>>>>> Index: b/arch/x86/kvm/lapic.c
>> >>>>>>> ===================================================================
>> >>>>>>> --- a/arch/x86/kvm/lapic.c
>> >>>>>>> +++ b/arch/x86/kvm/lapic.c
>> >>>>>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
>> >>>>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>> >>>>>>> {
>> >>>>>>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>> >>>>>>> - int r = 0;
>> >>>>>>>
>> >>>>>>> - if (vcpu->vcpu_id == 0) {
>> >>>>>>> - if (!apic_hw_enabled(vcpu->arch.apic))
>> >>>>>>> - r = 1;
>> >>>>>>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
>> >>>>>>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
>> >>>>>>> - r = 1;
>> >>>>>>> - }
>> >>>>>>> - return r;
>> >>>>>>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
>> >>>>>>> + (lvt0 & APIC_LVT_MASKED) == 0)
>> >>>>>>> + return 1;
>> >>>>>>> + return 0;
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>> >>>>>>>
>> >>>>>> (sorry for late review...)
>> >>>>>>
>> >>>>>> Thanks to find out the root cause of BSOD!
>> >>>>>>
>> >>>>>> But I am a little concern about this change. As you know, PIC only
>> >>>>>> connect to
>> >>>>>> cpu0. So I think it's not proper to make it generic.
>> >>>>> I don't think so - and if it were true, qemu would have a bug then, see
>> >>>>> its corresponding code.
>> >>>> You can refer to Intel MP spec, virtual wire mode. Google
>> >>>> "MP spec" can find it.
>> >>> Ah, good reference.
>> >>>
>> >>>> Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford
>> >>>> SMP,
>> >>>> otherwise we won't need IOAPIC/LAPIC). After that, it should be
>> >>>> disabled.
>> >>>> And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic,
>> >>>> so
>> >>>> that's why you see
>> >>>>
>> >>>> GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
>> >>>>
>> >>>> KVM follow virtual wire mode exactly.
>> >>> According to my understanding of the spec, the virtual wire mode means
>> >>> that the PIC signal is delivered via LVT0, and thus can be received by
>> >>> _all_ CPUs in the system. However, only the BSP usually enables LVT0,
>> >>> thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
>> >>> it also unmasks the other CPUs (and reprograms all to deliver NMIs
>> >>> instead of EXTINTs).
>> >>>
>> >>> Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
>> >>> only the latter. That mode is obviously target by the current
>> >>> kvm_apic_accept_pic_intr implementation. But I find no indication in the
>> >>> spec yet that both modes cannot exists in the same system. But I also
>> >>> fail to understand how one could switch between both modes (via
>> >>> software).
>> >> No. If so, we don't need to check LVT0.
>> >
>> > OK, there is that IMCR register to enable/disable the PIC Mode - but
>> > neither KVM nor QEMU implement it (which may indicate they both want to
>> > provide Virtual Wire Mode?). When enabled, Virtual Wire Mode is
>> > effectively disabled (for the BSP at least) as the LAPIC is disconnected
>> > (from the BSP).
>> >
>> > Still, I find not trace in the spec that says only the BSP can receive
>> > PIC interrupts in Virtual Wire Mode (the LVT0 line is connected to _all_
>> > CPUs).
>>
>> Now I checked also the BIOS KVM is shipping, and the MP Feature byte 2,
>> bit 7 (IMCRP) is cleared, thus KVM is providing the Virtual Wire mode.
>> Looking at Figure 3-3 of the MP spec, one can see that the PIC's output
>> is connected to the LVT0 line in this mode, and that this line is
>> connected to all CPUs in the system. So I can't help concluding that a)
>> QEMU's implementation is correct and b) my patch is correct as well. Or
>> please tell me where I'm wrong now...
>
> Frankly speaking, here are two apporoaches. Both are OK to work. You
> insisted the QEmu method, emulate that line connect all lapic's LVT0. And I
> insisted to follow the current solution, the dot-line of virtual wire mode
> in the spec, then make NMI watchdog as a separate thing, impact others as
> small as possible.
>
> When I wrote NMI watchdog, I don't want to involve PIC, for it's special
> case of PIC usage. So I think it's OK to not emulate the path here, then use
> apic_local_deliver() to send the interrupt directly, not through the PIC
> path. If PIC involved, that's another path. Current QEmu covered this,
> pic_request_irq() send to every vcpu, emulate that whole LVT0 line. Our KVM
> choose a different way, we just assume PIC only connect to LVT0 of BSP, for
> others should be disabled. That's save a lot when you have a lot of vcpus,
> as you said.
>
> So currently, QEmu emulate virtual wire mode well, and KVM do some
> simplification, only connect to BSP. Both of them follow this in each's
> code. And for KVM, the change to kvm_apic_accept_pic_intr() broke this
> assumption. Now we only work PIC with BSP, but check all the vcpus. I don't
> think that's a good combination. I think we are not likely do more to
> improve our PIC connection method, so NMI watchdog in KVM was designed as a
> separate thing, as a special case, and should be the only special case.
>
> kvm_cpu_has_interrupt() called every time before VM entry to check if
> there are any intr can be injected. If lapic got none,
> it would check kvm_apic_accept_pic_intr(). Check every vcpu or only check
> vcpu0, would bring about (vcpu_nr - 1) * ((vm_exit_nr - lapic_has_intr_nr) /
> vcpu_nr)(if we assume vmexit on every vcpu is the mostly compatiable) more
comparable... not compatiable...
> times to do the judgment on other vcpus here. And normally, the latter
> number would tens of thousand to hundreds of thousands. If you care about
> 1000 per vcpu's touch in pit, why you don't care about them here?
>
Well, I just hope you can understand my thought.
Thanks.
--
regards,
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html