To circle back: I can reproduce the VM lock-up 100% of the time by typing
too quickly into the VM virtual serial console, such as my password and
longer command strings that I know by muscle memory.

I tried a few things such as slowly typing several kilobytes of text into
the console, one character at a time.

If I mash the keyboard inside cu, the VM locks up. I went to the text
console of the VM host (my daily-driver laptop), and slowly decreased the
keyboard repeat time with:

wsconsctl keyboard.repeat.deln=<n>

And then attached to the vm virtual console using "doas vmctl console 1"

I proceeded to hold down a key and let a few lines of text show up before
exiting the console, decreasing the deln delay further, and repeating the
experiment.

100 is the default value, so holding a key down (longer than the default
400msec value of del1) will result in a 100msec delay between repeat
keystrokes on input.

I reduced this first to 75, then to 50, 25, 15, 10, and 5.

With a repeat delay of 5msec on the virtual console, I was able to reliably
lock up vms in a few dozen "keystrokes" (a matter of a second or two
holding a key down).

I was able to get three different vms to lock up, one running the october
22 snapshot, and two others running OpenBSD-6.0 Release, one i386, the
other amd64.

I cannot reproduce this, even with a high keyboard repeat rate, though an
SSH session to any of the VMs.

Mike and I have been in touch off-list (Thanks again!), but I thought the
results of my testing were relevant to misc@.



On Wed, Oct 26, 2016 at 7:15 PM, Mike Larkin <mlar...@azathoth.net> wrote:

> On Wed, Oct 26, 2016 at 06:36:25PM -0500, Ax0n wrote:
> > I'm running vmd with the options you specified, and using tee(1) to peel
> it
> > off to a file while I can still watch what happens in the foreground. It
> > hasn't happened again yet, but I haven't been messing with the VMs as
> much
> > this week as I was over the weekend.
> >
> > One thing of interest: inside the VM running the Oct 22 snapshot, top(1)
> > reports the CPU utilization hovering over 1.0 load, with nearly 100% in
> > interrupt state, which seems pretty odd to me.  I am also running an i386
> > and amd64 vm at the same time, both on 6.0-Release and neither of them
> are
> > exhibiting this high load. I'll probably update the snapshot of the
> > -CURRENT(ish) VM tonight, and the snapshot of my host system (which is
> also
> > my daily driver) this weekend.
> >
>
> I've seen that (and have seen it reported) from time to time as well. This
> is unlikely time being spent in interrupt, it's more likely a time
> accounting
> error that's making the guest think it's spending more in interrupt
> servicing
> than it actually is. This is due to the fact that both the statclock and
> hardclock are running at 100Hz (or close to it) because the host is unable
> to inject more frequent interrupts.
>
> You might try running the host at 1000Hz and see if that fixes the problem.
> It did, for me. Note that such an adjustment is really a hack and should
> just be viewed as a temporary workaround. Of course, don't run your guests
> at 1000Hz as well (that would defeat the purpose of cranking the host).
> That
> parameter can be adjusted in param.c.
>
> -ml
>
> > load averages:  1.07,  1.09,  0.94               vmmbsd.labs.h-i-r.net
> > 05:05:27
> > 26 processes: 1 running, 24 idle, 1 on processor                       up
> >  0:28
> > CPU states:  0.0% user,  0.0% nice,  0.4% system, 99.6% interrupt,  0.0%
> > idle
> > Memory: Real: 21M/130M act/tot Free: 355M Cache: 74M Swap: 0K/63M
> >
> >   PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU
> COMMAND
> >     1 root      10    0  420K  496K idle      wait      0:01  0.00% init
> > 13415 _ntp       2  -20  888K 2428K sleep     poll      0:00  0.00% ntpd
> > 15850 axon       3    0  724K  760K sleep     ttyin     0:00  0.00% ksh
> > 42990 _syslogd   2    0  972K 1468K sleep     kqread    0:00  0.00%
> syslogd
> > 89057 _pflogd    4    0  672K  424K sleep     bpf       0:00  0.00%
> pflogd
> >  2894 root       2    0  948K 3160K sleep     poll      0:00  0.00% sshd
> > 85054 _ntp       2    0  668K 2316K idle      poll      0:00  0.00% ntpd
> >
> >
> >
> > On Tue, Oct 25, 2016 at 2:09 AM, Mike Larkin <mlar...@azathoth.net>
> wrote:
> >
> > > On Mon, Oct 24, 2016 at 11:07:32PM -0500, Ax0n wrote:
> > > > Thanks for the update, ml.
> > > >
> > > > The VM Just did it again in the middle of backspacing over uname
> -a...
> > > >
> > > > $ uname -a
> > > > OpenBSD vmmbsd.labs.h-i-r.net 6.0 GENERIC.MP#0 amd64
> > > > $ un   <-- frozen
> > > >
> > > > Spinning like mad.
> > > >
> > >
> > > Bizarre. If it were I, I'd next try killing all vmd processes and
> > > running vmd -dvvv from a root console window and look for what it dumps
> > > out when it hangs like this (if anything).
> > >
> > > You'll see a fair number of "vmd: unknown exit code 1" (and 48), those
> > > are harmless and can be ignored, as can anything that vmd dumps out
> > > before the vm gets stuck like this.
> > >
> > > If you capture this and post somewhere I can take a look. You may need
> to
> > > extract the content out of /var/log/messages if a bunch gets printed.
> > >
> > > If this fails to diagnose what happens, I can work with you off-list on
> > > how to debug further.
> > >
> > > -ml
> > >
> > > > [axon@transient ~]$ vmctl status
> > > >    ID   PID VCPUS    MAXMEM    CURMEM              TTY NAME
> > > >     2  2769     1     512MB     149MB       /dev/ttyp3 -c
> > > >     1 48245     1     512MB     211MB       /dev/ttyp0 obsdvmm.vm
> > > > [axon@transient ~]$ ps aux | grep 48245
> > > > _vmd     48245 98.5  2.3 526880 136956 ??  Rp     1:54PM   47:08.30
> vmd:
> > > > obsdvmm.vm (vmd)
> > > >
> > > > load averages:  2.43,  2.36,
> > > > 2.26
> > > > transient.my.domain 18:29:10
> > > > 56 processes: 53 idle, 3 on
> > > > processor
> > > > up  4:35
> > > > CPU0 states:  3.8% user,  0.0% nice, 15.4% system,  0.6% interrupt,
> 80.2%
> > > > idle
> > > > CPU1 states: 15.3% user,  0.0% nice, 49.3% system,  0.0% interrupt,
> 35.4%
> > > > idle
> > > > CPU2 states:  6.6% user,  0.0% nice, 24.3% system,  0.0% interrupt,
> 69.1%
> > > > idle
> > > > CPU3 states:  4.7% user,  0.0% nice, 18.1% system,  0.0% interrupt,
> 77.2%
> > > > idle
> > > > Memory: Real: 1401M/2183M act/tot Free: 3443M Cache: 536M Swap:
> 0K/4007M
> > > >
> > > >   PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU
> > > COMMAND
> > > > 48245 _vmd      43    0  515M  134M onproc    thrslee  47:37 98.00%
> vmd
> > > >  7234 axon       2    0  737M  715M sleep     poll     33:18 19.14%
> > > firefox
> > > > 42481 _x11      55    0   16M   42M onproc    -         2:53  9.96%
> Xorg
> > > >  2769 _vmd      29    0  514M   62M idle      thrslee   2:29  9.62%
> vmd
> > > > 13503 axon      10    0  512K 2496K sleep     nanosle   0:52  1.12%
> wmapm
> > > > 76008 axon      10    0  524K 2588K sleep     nanosle   0:10  0.73%
> wmmon
> > > > 57059 axon      10    0  248M  258M sleep     nanosle   0:08  0.34%
> wmnet
> > > > 23088 axon       2    0  580K 2532K sleep     select    0:10  0.00%
> > > > wmclockmon
> > > > 64041 axon       2    0 3752K   10M sleep     poll      0:05  0.00%
> > > wmaker
> > > > 16919 axon       2    0 7484K   20M sleep     poll      0:04  0.00%
> > > > xfce4-terminal
> > > >     1 root      10    0  408K  460K idle      wait      0:01  0.00%
> init
> > > > 80619 _ntp       2  -20  880K 2480K sleep     poll      0:01  0.00%
> ntpd
> > > >  9014 _pflogd    4    0  672K  408K sleep     bpf       0:01  0.00%
> > > pflogd
> > > > 58764 root      10    0 2052K 7524K idle      wait      0:01  0.00%
> slim
> > > >
> > > >
> > > >
> > > > On Mon, Oct 24, 2016 at 10:47 PM, Mike Larkin <mlar...@azathoth.net>
> > > wrote:
> > > >
> > > > > On Mon, Oct 24, 2016 at 07:36:48PM -0500, Ax0n wrote:
> > > > > > I suppose I'll ask here since it seems on-topic for this thread.
> Let
> > > me
> > > > > > know if I shouldn't do this in the future. I've been testing vmm
> for
> > > > > > exactly a week on two different snapshots. I have two VMs: One
> > > running
> > > > > the
> > > > > > same snapshot (amd64, Oct 22) I'm running on the host vm, the
> other
> > > > > running
> > > > > > amd64 6.0-RELEASE with no patches of any kind.
> > > > > >
> > > > > > For some reason, the vm running a recent snapshot locks up
> > > occasionally
> > > > > > while I'm interacting with it via cu or occasionally ssh. Should
> I
> > > > > expect a
> > > > > > ddb prompt and/or kernel panic messages via the virtualized
> serial
> > > > > console?
> > > > > > Is there some kind of "break" command on the console to get into
> ddb
> > > when
> > > > > > it appears to hang? A "No" or "Not yet" on those two questions
> would
> > > > > > suffice if not possible. I know this isn't supported, and
> appreciate
> > > the
> > > > > > hard work.
> > > > > >
> > > > > > Host dmesg:
> > > > > > http://stuff.h-i-r.net/2016-10-22.Aspire5733Z.dmesg.txt
> > > > > >
> > > > > > VM (Oct 22 Snapshot) dmesg:
> > > > > > http://stuff.h-i-r.net/2016-10-22.vmm.dmesg.txt
> > > > > >
> > > > >
> > > > > These look fine. Not sure why it would have locked up. Is the
> > > associated
> > > > > vmd
> > > > > process idle, or spinning like mad?
> > > > >
> > > > > -ml
> > > > >
> > > > > > Second:
> > > > > > I'm using vm.conf (contents below) to start the aforementioned
> > > snapshot
> > > > > vm
> > > > > > at boot.
> > > > > > There's a "disable" line inside vm.conf to keep one VM from
> spinning
> > > up
> > > > > > with vmd.  Is there a way to start this one with vmctl aside from
> > > passing
> > > > > > all the options to vmctl as below?
> > > > > >
> > > > > > doas vmctl start -c -d OBSD-RELa -i 1 -k
> /home/axon/obsd/amd64/bsd -m
> > > > > 512M
> > > > > >
> > > > > > I've tried stuff along the lines of:
> > > > > > doas vmctl start OBSD-RELa.vm
> > > > > >
> > > > > > vm "obsdvmm.vm" {
> > > > > >         memory 512M
> > > > > >         kernel "bsd"
> > > > > >         disk "/home/axon/vmm/OBSD6"
> > > > > >         interface tap
> > > > > > }
> > > > > > vm "OBSD-RELa.vm" {
> > > > > >         memory 512M
> > > > > >         kernel "/home/axon/obsd/amd64/bsd"
> > > > > >         disk "/home/axon/vmm/OBSD-RELa"
> > > > > >         interface tap
> > > > > >         disable
> > > > > > }
> > > > > >
> > > > >
> > > > > I think this is being worked on, but not done yet.
> > > > >
> > > > > -ml

Reply via email to