> From: Andrew Davis <[email protected]> > Date: Wed, 27 Dec 2017 11:39:54 -0500 > > Hello again, > > I tested with each of the "acpihpet0", "acpitimer0", and "i8254" timers. > The timing problem manifested when using all 3 timers. I ran the date > loop with "acpihpet0" and "acpitimer0" until the issue manifested, and > let "i8254" run overnight. > > Here are some snippets from the date logs from where I started logging > the date loop, and where the timing issue became present. > > acpitimer0: > > Â Â Â Tue Dec 26 23:57:57 UTC 2017 > Â Â Â Tue Dec 26 23:57:58 UTC 2017 > Â Â Â ... > Â Â Â Wed Dec 27 00:10:10 UTC 2017 > Â Â Â Wed Dec 27 00:10:12 UTC 2017 > Â Â Â Wed Dec 27 00:10:14 UTC 2017 > > i8254: > > Â Â Â Wed Dec 27 00:14:23 UTC 2017 > Â Â Â Wed Dec 27 00:14:24 UTC 2017 > Â Â Â ... > Â Â Â Wed Dec 27 00:59:30 UTC 2017 > Â Â Â Wed Dec 27 00:59:31 UTC 2017 > Â Â Â Wed Dec 27 00:59:33 UTC 2017 > > acpihpet0: > > Â Â Â Wed Dec 27 16:20:54 UTC 2017 > Â Â Â Wed Dec 27 16:20:55 UTC 2017 > Â Â Â ... > Â Â Â Wed Dec 27 16:32:44 UTC 2017 > Â Â Â Wed Dec 27 16:32:45 UTC 2017 > Â Â Â Wed Dec 27 16:32:47 UTC 2017 > Â Â Â Wed Dec 27 16:32:49 UTC 2017 > > The i8254 timer hit a point where the system stopped reporting the > proper time altogether. I ran these commands this morning after my > OpenBSD VM ran with i8254 overnight, and this is what the "date" command > displayed. The proper time is shown below. > > Â Â Â # sysctl | grep -i timecounter > Â Â Â kern.timecounter.tick=1 > Â Â Â kern.timecounter.timestepwarnings=0 > Â Â Â kern.timecounter.hardware=i8254 > Â Â Â kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000) > dummy(-1000000) > > Â Â Â # date > Â Â Â Wed Dec 27 01:35:51 UTC 2017 > > Â Â Â [root@local-linux ~]# date > Â Â Â Wed Dec 27 16:11:05 UTC 2017
Your test results are consistent with the local APIC emulation being broken in Linux/KVM. Regardless of what hardware is used for the timecounter, the clock interrupts use the local APIC timer in OpenBSD. OpenBSD programs the local APIC to interrupt every 10ms in so-called repeated mode. The clock interrupt is then responsable for reading the timecounter to update the current wall clock time and for running things like timeouts that wake up tasks that are sleeping. If we get no clock interrupts those wakeups don't happen, and your sleeps take longer than what you intended. But as long as the timecounter doesn't wrap the wall clock time will be correctly updated once another clock interrupt comes in. And that's what happens with the i8524 timecounter. It wraps fairly quickly, so if the clock interrupts don't come in for a while, OpenBSD's idea of wall clock time starts to get out of sync with reality. So why do other systems not suffer from this problem? I'm fairly certain they also use the local APIC for clock interrupts. But the systems you tested (Linux, FreeBSD) probably don't run it in repeated mode. Some people consider running the local APIC in repeated mode a bad idea. And they might even be right. Waking a system up at regular intervals even if there is no real work to do is a bit silly and wastes power. Although one could argue that 10ms between wakeups is long enough for this to matter much on modern systems. Maybe we'll change the way we do clock interrupts at some point in the future. It would probably help vmm(4). But this is not a trivial task and won't happen overnight. Working around bugs in someone else's software certainly isn't enough motivation for me to implement it. Cheers, Mark > On 12/26/2017 5:44 PM, Mike Larkin wrote: > > On Tue, Dec 26, 2017 at 03:24:03PM -0500, Andrew Davis wrote: > >> Hello, > >> > >> No, I didn't changing the kern.timecounter selection directly. I had tried > >> disabling the HPET on qemu/kvm (which may have affected this selection?). > >> > >> Two of my boxes, both OpenBSD 6.1 report this: > >> > >> # sysctl kern.timecounter > >> kern.timecounter.tick=1 > >> kern.timecounter.timestepwarnings=0 > >> kern.timecounter.hardware=acpihpet0 > >> kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000) > >> dummy(-1000000) > >> > >> Best, > >> Andrew > >> > > Could you try one of the others and let us know if it helps, please? > > > > -ml > > > >> On 12/26/2017 2:36 PM, Mike Larkin wrote: > >>> On Tue, Dec 26, 2017 at 12:27:31PM -0500, Andrew Davis wrote: > >>>> Hello, > >>>> > >>>> I'm experiencing some odd timing issues on OpenBSD 6.2 (and 6.1) on the > >>>> system listed below. This is preventing me from running OpenBSD on my > >>>> servers. Can you determine if this is a bug in the OpenBSD operating > >>>> system? > >>>> I can provide more information if needed. > >>>> > >>>> Virtualized environment. > >>>> > >>>> Host CPU: 2 x Intel E5-2630 v3 2.4 Ghz > >>>> Host OS: Fedora 27 > >>>> Virtualization software: QEMU + KVM (2.10.0-1.fc27) > >>>> Guest Machine: default (pc-i440fx-2.10) > >>>> Guest OS: OpenBSD 6.2 (and 6.1). > >>>> > >>>> Basically, OpenBSD processes degrade over time to the point where they're > >>>> completely unresponsive. This simple date printout script is a good > >>>> example. > >>>> It should print out the date once per second, but after roughly ~20 mins > >>>> on > >>>> this hardware configuration, it takes 2 seconds to print each line, then > >>>> 4 > >>>> seconds to print each line, and so on. After running for about 24 hours, > >>>> the > >>>> delay is about 1 minute between line printouts. > >>>> > >>>> Â Â Â while sleep 1; do date; done > >>>> > >>>> I've tried tweaking some different settings on the guest and host, such > >>>> as > >>>> disabling the HPET timer and x2apic, neither of which has proven > >>>> effective. > >>>> > >>>> I saw mention of adding "kvm-intel.preemption_timer=0" in another recent > >>>> thread. This seems to resolve the slowdown issue. > >>>> > >>>> However, I have run other guest operating systems on this hardware > >>>> configuration (CentOS, Ubuntu, FreeBSD) - neither of which required any > >>>> of > >>>> these tweaks, or experienced timing issues. This leads me to believe > >>>> that it > >>>> could be related to a bug in OpenBSD. > >>>> > >>>> I have access to several machines with this hardware configuration and > >>>> tested on multiple machines, to rule out a possible one-off hardware > >>>> issue. > >>>> Each host displayed the same behavior. > >>>> > >>>> Best regards, > >>>> Andrew > >>>> > >>> What timecounter source did the OpenBSD guests pick? Did you try selecting > >>> one of the other choices to see if this helps? > >>> > >>> sysctl kern.timecounter if you're not sure what I'm talking about. > >>> > >>> -ml > >
