On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
> Marcelo Tosatti wrote:
> > On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
> >>> +  * Since reinjection is not rate-limited, use the delay
> >>> +          * to inject the last interrupt as an estimate.
> >>> +          */
> >>> + if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> >>> +         remaining = apic->timer.injection_delay;
> >>> +         if (ktime_to_ns(remaining) > apic->timer.period)
> >>> +                 remaining = ns_to_ktime(apic->timer.period);
> >>> +        } else
> >>> +         remaining = hrtimer_expires_remaining(&apic->timer.dev);
> >>
> >> A little doubt...
> >>
> >> A: time_fire
> >> B: intr_post
> >> C: read TMCCT
> >>
> >> The sequence can be ABC or ACB.
> >>
> >> injection_delay = time(B) - time(A)
> >>
> >> So it didn't count time from read TMCCT... And guest get interrupt at
> >> time(B), not quite understand why time(B) - time(A) matters here...
> >
> > Its an estimate of the delay it takes to inject an interrupt to the
> > guest once fired. Its only used if there have been accumulated ones, so
> > ACB sequence with pending=0 will use hrtimer_expires_remaining().
> >
> >> I think the reasonable here means, this interval is usable later after
> >> the accumulated interrupts are injected. From this point of view, I
> >> think current solution is reasonable. It just assume the delayed
> >> interrupts have been injected.
> >>
> >> However, seriously, any value here is wrong, no elegant one.
> >
> > I agree.
> >
> >> But I still prefer to the current solution...
> >
> > Why? The proposed version is smaller and simpler than the current
> > one IMO, and also not vulnerable to whatever bug is causing now <
> > last_update.
> >
> > And more precise, since the current scheme uses interrupt injection time
> > as if it was "count-down restart" time, which is not true.
> >
> >> And here is not the really problem for now I think. The current
> >> mechanism is mostly OK, but where is the bug... We have either have a
> >> simple fix(e.g. if now < last_update, then return 0) or dig into it. Did
> >> it worth a try? Anyway, it would return a buggy result if we have
> >> pending interrupts...
> >
> > The overflow calculation is not necessary as discussed. Alexander, can
> > you please try the following? Sheng, do you have any other suggestion to
> > test?
> >
> > /proc/timer_list output of the host when ESX is running too.
>
> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
> after the issue occured. I left the warning message in that I put there
> in case the value is too high:
>
>         counter_passed = div64_u64(ktime_to_ns(passed),
>                                    (APIC_BUS_CYCLE_NS *
> apic->timer.divide_count));
>
> +       if (counter_passed > 0x7f00000000000000) {
> +               /* If we're in here we probably encountered a bug! */
> +               printk(KERN_INFO "LAPIC: Too high counter_passed value:
> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
> ktime_to_ns(passed), passed, now.tv64, apic
> ->timer.last_update.tv64);
> +       }
> +
>         if (counter_passed > tmcct) {
>                 if (unlikely(!apic_lvtt_period(apic))) {
>                         /* one-shot timers stick at 0 until reset */
>
>
>
> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
> last_update = 1014860818546063 now = 1014860818426085
> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f)
> start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer
> initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda.
>
Marcelo, I realize some interesting things.

In fact, on my machine, when I measured the delta of now() and last_update at 
kvm_apic_timer_intr_post(), the delta was bigger and bigger...(and now() is 
always bigger for me, so still no clue for why data can be like above, 
last_update always ahead of when for about one period...).

Then I found something not good in original design - it ignored the interval 
between time fire and injection, so we got:

last_update = now() + n * period;

And the time we update last_update:

time = now() + n * period + n * interval.

So last_update time is more and more inaccurate... Though it was revised by 
tmcct function, it's still not a good way to go.

Then I understand your purpose more.

+       if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
+               remaining = apic->timer.injection_delay;
+               if (ktime_to_ns(remaining) > apic->timer.period)
+                       remaining = ns_to_ktime(apic->timer.period);
+        } else
+               remaining = hrtimer_expires_remaining(&apic->timer.dev);

And about your patch, how about take interval between intr_post() and read 
tmcct in to account as well? That can keep the consistent with 
hrtimer_get_remaining() in the read tmcct. And I think if remaining > period, 
remaining = remain % period maybe more reasonable here.

How do you think?

-- 
regards
Yang, Sheng



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to