Re: vmd: fix rebooting a received vm

2022-05-07 Thread Mike Larkin
On Sat, May 07, 2022 at 07:58:15AM -0400, Dave Voutila wrote:
> tech@:
>
> Now that vmd only accounts for memory in bytes [1], this fix is a lot
> simpler!
>
> If you use the send/receive functionality and "receive" a sent vm, it
> functions as expected. However, if that vm tries to reboot, it causes
> vmd to exit. (An ipc socket is closed in some error handling and
> triggers a code path ending vmd's event loop.)
>
> The problem was two-fold (and describing it is probably longer than the
> diff itself):
>
> 1. Not un-toggling the VM_RECEIVE_STATE bit on the vm after initial
>launch, triggering "received vm" code paths upon vm reboot.
>
>vmd's "parent" and "vmm" processes *both* track known vm's. The "vmm"
>process removes the vm from its list upon a loss of the child process
>(vm reboot), but the "parent" process keeps it in the tailq and
>reuses it, knowing the vm just requires a restart. (It has to resend
>the vm to the "vmm" process, which sees it as a "new" vm, creating a
>new child process.)
>
> 2. A "received vm" comes with pre-defined memory ranges created when it
>initially booted and these are restored before the vm is resumed. The
>problem is vmd overloads the use of these memory ranges, setting the
>number of ranges to 0 and using the first range's size as a way to
>communicate "max memory" for the vm. Since a clean reboot of a vm
>results in the "parent" process triggering the "vm start" paths, it
>assumes it can use that logic to determine the max memory.
>
>Depending on if you only fix (1) above, the vm results in either
>using the default vm memory (512MB) _or_ the size of the first
>range...which is always 640KB.
>
>Contrary to popular belief, 640KB is not enough for everyone,
>especially our vm.
>
> The diff below resolves (1) in vmd.c's vm_stop() and (2) in config.c's
> config_setvm().
>
> The fact this issue has been present for awhile indicates few people use
> or care about the send/receive functionality. I want to keep the
> functionality in place for awhile longer because I've begun to
> experiment with it *and* it's helping me find other bugs in vmd(8) as
> well as vmm(4). (Expect a vmm diff shortly.)
>
> For anyone looking to test [2], the simplest approach is to create a vm
> without a disk just boot the bsd.rd ramdisk while using a memory value
> that's *not* the default 512m:
>
>   # vmctl start -c -b /bsd.rd -m 1g test
>
> Wait for it to give you the installer prompt and then send it to a file:
>
>   # vmctl send test > test.vm
>
> You should have a 1g test.vm file. Restore it:
>
>   # vmctl receive test < test.vm
>
> Connect to the console and reboot:
>
>   # vmctl console test
>   (in vm)# reboot
>
> With the diff: the vm reboots and you end up back at the installer
> prompt. `vmctl stat` shows the correct 1g max mem. Reboot at least one
> more time and confirm the same result.
>
> Without the diff: the vmd parent process will exit taking its children
> with it.
>
> ok?
>

reads ok to me, thanks for the explanation. ok mlarkin

> -dv
>
> [1] https://marc.info/?l=openbsd-tech=165151507323339=2
>
> [2] note that the vmm issue I found means this will work reliably on AMD
> hosts, but may not on Intel hosts. fix coming soon.
>
> diff refs/heads/master refs/heads/vmd-memrange
> blob - 2750be4f580896325e5a3971667c64d61231db06
> blob + cf076cdc27ceaee6e2cbb9cce5825452f0a6
> --- usr.sbin/vmd/config.c
> +++ usr.sbin/vmd/config.c
> @@ -231,6 +231,7 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, ui
>   unsigned int unit;
>   struct timeval   tv, rate, since_last;
>   struct vmop_addr_req var;
> + size_t   bytes = 0;
>
>   if (vm->vm_state & VM_STATE_RUNNING) {
>   log_warnx("%s: vm is already running", __func__);
> @@ -518,6 +519,14 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, ui
>
>   free(tapfds);
>
> + /* Collapse any memranges after the vm was sent to PROC_VMM */
> + if (vcp->vcp_nmemranges > 0) {
> + for (i = 0; i < vcp->vcp_nmemranges; i++)
> + bytes += vcp->vcp_memranges[i].vmr_size;
> + memset(>vcp_memranges, 0, sizeof(vcp->vcp_memranges));
> + vcp->vcp_nmemranges = 0;
> + vcp->vcp_memranges[0].vmr_size = bytes;
> + }
>   vm->vm_state |= VM_STATE_RUNNING;
>   return (0);
>
> blob - 4d7e7b5e613723c2166077523dd6e8b9177d6718
> blob + d5d841fd20d9f82e852e3b844ec81d9383713923
> --- usr.sbin/vmd/vmd.c
> +++ usr.sbin/vmd/vmd.c
> @@ -1162,7 +1162,8 @@ vm_stop(struct vmd_vm *vm, int keeptty, const char *ca
>   __func__, ps->ps_title[privsep_process], caller,
>   vm->vm_vmid, keeptty ? ", keeping tty open" : "");
>
> - vm->vm_state &= ~(VM_STATE_RUNNING | VM_STATE_SHUTDOWN);
> + vm->vm_state &= ~(VM_STATE_RECEIVED | VM_STATE_RUNNING
> + | VM_STATE_SHUTDOWN);
>
>   

vmd: fix rebooting a received vm

2022-05-07 Thread Dave Voutila
tech@:

Now that vmd only accounts for memory in bytes [1], this fix is a lot
simpler!

If you use the send/receive functionality and "receive" a sent vm, it
functions as expected. However, if that vm tries to reboot, it causes
vmd to exit. (An ipc socket is closed in some error handling and
triggers a code path ending vmd's event loop.)

The problem was two-fold (and describing it is probably longer than the
diff itself):

1. Not un-toggling the VM_RECEIVE_STATE bit on the vm after initial
   launch, triggering "received vm" code paths upon vm reboot.

   vmd's "parent" and "vmm" processes *both* track known vm's. The "vmm"
   process removes the vm from its list upon a loss of the child process
   (vm reboot), but the "parent" process keeps it in the tailq and
   reuses it, knowing the vm just requires a restart. (It has to resend
   the vm to the "vmm" process, which sees it as a "new" vm, creating a
   new child process.)

2. A "received vm" comes with pre-defined memory ranges created when it
   initially booted and these are restored before the vm is resumed. The
   problem is vmd overloads the use of these memory ranges, setting the
   number of ranges to 0 and using the first range's size as a way to
   communicate "max memory" for the vm. Since a clean reboot of a vm
   results in the "parent" process triggering the "vm start" paths, it
   assumes it can use that logic to determine the max memory.

   Depending on if you only fix (1) above, the vm results in either
   using the default vm memory (512MB) _or_ the size of the first
   range...which is always 640KB.

   Contrary to popular belief, 640KB is not enough for everyone,
   especially our vm.

The diff below resolves (1) in vmd.c's vm_stop() and (2) in config.c's
config_setvm().

The fact this issue has been present for awhile indicates few people use
or care about the send/receive functionality. I want to keep the
functionality in place for awhile longer because I've begun to
experiment with it *and* it's helping me find other bugs in vmd(8) as
well as vmm(4). (Expect a vmm diff shortly.)

For anyone looking to test [2], the simplest approach is to create a vm
without a disk just boot the bsd.rd ramdisk while using a memory value
that's *not* the default 512m:

  # vmctl start -c -b /bsd.rd -m 1g test

Wait for it to give you the installer prompt and then send it to a file:

  # vmctl send test > test.vm

You should have a 1g test.vm file. Restore it:

  # vmctl receive test < test.vm

Connect to the console and reboot:

  # vmctl console test
  (in vm)# reboot

With the diff: the vm reboots and you end up back at the installer
prompt. `vmctl stat` shows the correct 1g max mem. Reboot at least one
more time and confirm the same result.

Without the diff: the vmd parent process will exit taking its children
with it.

ok?

-dv

[1] https://marc.info/?l=openbsd-tech=165151507323339=2

[2] note that the vmm issue I found means this will work reliably on AMD
hosts, but may not on Intel hosts. fix coming soon.

diff refs/heads/master refs/heads/vmd-memrange
blob - 2750be4f580896325e5a3971667c64d61231db06
blob + cf076cdc27ceaee6e2cbb9cce5825452f0a6
--- usr.sbin/vmd/config.c
+++ usr.sbin/vmd/config.c
@@ -231,6 +231,7 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, ui
unsigned int unit;
struct timeval   tv, rate, since_last;
struct vmop_addr_req var;
+   size_t   bytes = 0;

if (vm->vm_state & VM_STATE_RUNNING) {
log_warnx("%s: vm is already running", __func__);
@@ -518,6 +519,14 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, ui

free(tapfds);

+   /* Collapse any memranges after the vm was sent to PROC_VMM */
+   if (vcp->vcp_nmemranges > 0) {
+   for (i = 0; i < vcp->vcp_nmemranges; i++)
+   bytes += vcp->vcp_memranges[i].vmr_size;
+   memset(>vcp_memranges, 0, sizeof(vcp->vcp_memranges));
+   vcp->vcp_nmemranges = 0;
+   vcp->vcp_memranges[0].vmr_size = bytes;
+   }
vm->vm_state |= VM_STATE_RUNNING;
return (0);

blob - 4d7e7b5e613723c2166077523dd6e8b9177d6718
blob + d5d841fd20d9f82e852e3b844ec81d9383713923
--- usr.sbin/vmd/vmd.c
+++ usr.sbin/vmd/vmd.c
@@ -1162,7 +1162,8 @@ vm_stop(struct vmd_vm *vm, int keeptty, const char *ca
__func__, ps->ps_title[privsep_process], caller,
vm->vm_vmid, keeptty ? ", keeping tty open" : "");

-   vm->vm_state &= ~(VM_STATE_RUNNING | VM_STATE_SHUTDOWN);
+   vm->vm_state &= ~(VM_STATE_RECEIVED | VM_STATE_RUNNING
+   | VM_STATE_SHUTDOWN);

user_inc(>vm_params.vmc_params, vm->vm_user, 0);
user_put(vm->vm_user);