> From: Andrew Davis <[email protected]>
> Date: Wed, 27 Dec 2017 11:39:54 -0500
> 
> Hello again,
> 
> I tested with each of the "acpihpet0", "acpitimer0", and "i8254" timers. 
> The timing problem manifested when using all 3 timers. I ran the date 
> loop with "acpihpet0" and "acpitimer0" until the issue manifested, and 
> let "i8254" run overnight.
> 
> Here are some snippets from the date logs from where I started logging 
> the date loop, and where the timing issue became present.
> 
> acpitimer0:
> 
>      Tue Dec 26 23:57:57 UTC 2017
>      Tue Dec 26 23:57:58 UTC 2017
>      ...
>      Wed Dec 27 00:10:10 UTC 2017
>      Wed Dec 27 00:10:12 UTC 2017
>      Wed Dec 27 00:10:14 UTC 2017
> 
> i8254:
> 
>      Wed Dec 27 00:14:23 UTC 2017
>      Wed Dec 27 00:14:24 UTC 2017
>      ...
>      Wed Dec 27 00:59:30 UTC 2017
>      Wed Dec 27 00:59:31 UTC 2017
>      Wed Dec 27 00:59:33 UTC 2017
> 
> acpihpet0:
> 
>      Wed Dec 27 16:20:54 UTC 2017
>      Wed Dec 27 16:20:55 UTC 2017
>      ...
>      Wed Dec 27 16:32:44 UTC 2017
>      Wed Dec 27 16:32:45 UTC 2017
>      Wed Dec 27 16:32:47 UTC 2017
>      Wed Dec 27 16:32:49 UTC 2017
> 
> The i8254 timer hit a point where the system stopped reporting the 
> proper time altogether. I ran these commands this morning after my 
> OpenBSD VM ran with i8254 overnight, and this is what the "date" command 
> displayed. The proper time is shown below.
> 
>      # sysctl | grep -i timecounter
>      kern.timecounter.tick=1
>      kern.timecounter.timestepwarnings=0
>      kern.timecounter.hardware=i8254
>      kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000) 
> dummy(-1000000)
> 
>      # date
>      Wed Dec 27 01:35:51 UTC 2017
> 
>      [root@local-linux ~]# date
>      Wed Dec 27 16:11:05 UTC 2017

Your test results are consistent with the local APIC emulation being
broken in Linux/KVM.  Regardless of what hardware is used for the
timecounter, the clock interrupts use the local APIC timer in OpenBSD.

OpenBSD programs the local APIC to interrupt every 10ms in so-called
repeated mode.  The clock interrupt is then responsable for reading
the timecounter to update the current wall clock time and for running
things like timeouts that wake up tasks that are sleeping.  If we get
no clock interrupts those wakeups don't happen, and your sleeps take
longer than what you intended.  But as long as the timecounter doesn't
wrap the wall clock time will be correctly updated once another clock
interrupt comes in.  And that's what happens with the i8524
timecounter.  It wraps fairly quickly, so if the clock interrupts
don't come in for a while, OpenBSD's idea of wall clock time starts to
get out of sync with reality.

So why do other systems not suffer from this problem?  I'm fairly
certain they also use the local APIC for clock interrupts.  But the
systems you tested (Linux, FreeBSD) probably don't run it in repeated
mode.  Some people consider running the local APIC in repeated mode a
bad idea.  And they might even be right.  Waking a system up at
regular intervals even if there is no real work to do is a bit silly
and wastes power.  Although one could argue that 10ms between wakeups
is long enough for this to matter much on modern systems.

Maybe we'll change the way we do clock interrupts at some point in the
future.  It would probably help vmm(4).  But this is not a trivial
task and won't happen overnight.  Working around bugs in someone
else's software certainly isn't enough motivation for me to implement
it.  

Cheers,

Mark


> On 12/26/2017 5:44 PM, Mike Larkin wrote:
> > On Tue, Dec 26, 2017 at 03:24:03PM -0500, Andrew Davis wrote:
> >> Hello,
> >>
> >> No, I didn't changing the kern.timecounter selection directly. I had tried
> >> disabling the HPET on qemu/kvm (which may have affected this selection?).
> >>
> >> Two of my boxes, both OpenBSD 6.1 report this:
> >>
> >> # sysctl kern.timecounter
> >> kern.timecounter.tick=1
> >> kern.timecounter.timestepwarnings=0
> >> kern.timecounter.hardware=acpihpet0
> >> kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
> >> dummy(-1000000)
> >>
> >> Best,
> >> Andrew
> >>
> > Could you try one of the others and let us know if it helps, please?
> >
> > -ml
> >
> >> On 12/26/2017 2:36 PM, Mike Larkin wrote:
> >>> On Tue, Dec 26, 2017 at 12:27:31PM -0500, Andrew Davis wrote:
> >>>> Hello,
> >>>>
> >>>> I'm experiencing some odd timing issues on OpenBSD 6.2 (and 6.1) on the
> >>>> system listed below. This is preventing me from running OpenBSD on my
> >>>> servers. Can you determine if this is a bug in the OpenBSD operating 
> >>>> system?
> >>>> I can provide more information if needed.
> >>>>
> >>>> Virtualized environment.
> >>>>
> >>>> Host CPU: 2 x Intel E5-2630 v3 2.4 Ghz
> >>>> Host OS: Fedora 27
> >>>> Virtualization software: QEMU + KVM (2.10.0-1.fc27)
> >>>> Guest Machine: default (pc-i440fx-2.10)
> >>>> Guest OS: OpenBSD 6.2 (and 6.1).
> >>>>
> >>>> Basically, OpenBSD processes degrade over time to the point where they're
> >>>> completely unresponsive. This simple date printout script is a good 
> >>>> example.
> >>>> It should print out the date once per second, but after roughly ~20 mins 
> >>>> on
> >>>> this hardware configuration, it takes 2 seconds to print each line, then 
> >>>> 4
> >>>> seconds to print each line, and so on. After running for about 24 hours, 
> >>>> the
> >>>> delay is about 1 minute between line printouts.
> >>>>
> >>>>       while sleep 1; do date; done
> >>>>
> >>>> I've tried tweaking some different settings on the guest and host, such 
> >>>> as
> >>>> disabling the HPET timer and x2apic, neither of which has proven 
> >>>> effective.
> >>>>
> >>>> I saw mention of adding "kvm-intel.preemption_timer=0" in another recent
> >>>> thread. This seems to resolve the slowdown issue.
> >>>>
> >>>> However, I have run other guest operating systems on this hardware
> >>>> configuration (CentOS, Ubuntu, FreeBSD) - neither of which required any 
> >>>> of
> >>>> these tweaks, or experienced timing issues. This leads me to believe 
> >>>> that it
> >>>> could be related to a bug in OpenBSD.
> >>>>
> >>>> I have access to several machines with this hardware configuration and
> >>>> tested on multiple machines, to rule out a possible one-off hardware 
> >>>> issue.
> >>>> Each host displayed the same behavior.
> >>>>
> >>>> Best regards,
> >>>> Andrew
> >>>>
> >>> What timecounter source did the OpenBSD guests pick? Did you try selecting
> >>> one of the other choices to see if this helps?
> >>>
> >>> sysctl kern.timecounter    if you're not sure what I'm talking about.
> >>>
> >>> -ml
> 
> 

Reply via email to