Re: vmd: fix i8259 race condition, vioblk hang

2023-08-29 Thread Florian Riehm
Hi,

I tested your patch and it is a great improvement for me.
My vms are hanging reproducible without your fix.

Thank you

Florian





Am Di., 29. Aug. 2023 um 18:01 Uhr schrieb Dave Voutila :

>
> Dave Voutila  writes:
>
> > mbuhl@ found an issue where the emulated virtio block device can
> > hang. The tl;dr: the emulated pic never injects an interrupt and the
> > vioblk(4) driver in the guest starves, waiting to be told to check the
> > virtqueue.
> >
> > Flipping the bit in the i8259 using gdb causes the spice to flow once
> > again.
> >
> > This diff fixes two related things (so could be committed in 2 parts):
> >
> > 1. the irq deassert on a virtio pci interrupt status register read
> >should occur synchronously by the vcpu thread in the vm and not
> >async.
> >
> > 2. always raise the irr bit in the pic, regardless of mask. The mask is
> >used when finding an irq to ack and shouldn't be used to determine if
> >the irr bit is raised. AFAICT from the old i8259 data sheets, masking
> >should have no effect on if the irr bit is set.
> >
> > This is holding up another diff I want to share that reduces latency in
> > interrupts, but it's shown to make this i8259 race condition worse, so
> > I'd like to give folks a few days to check if the below diff causes
> > regressions. Once this is committed, I'll feel safe sharing the latency
> > diff with tech@.
> >
> > Any ok's pending a few days for folks to test?
>
> Still looking for ok's or test reports (other than from mbuhl@, of
> course.)
>
> >
> > -dv
> >
> >
> > diff refs/heads/master cdba90a89ee6e77b52c2b6955d36a260a23a3b85
> > commit - ad1cd1152fddbf55189657a2df9f2468409698ab
> > commit + cdba90a89ee6e77b52c2b6955d36a260a23a3b85
> > blob - f7862f5e9d17f96f5260358271fab8f253b26505
> > blob + b6891c52c824d7b8c69e67e5323772152b1ed844
> > --- usr.sbin/vmd/i8259.c
> > +++ usr.sbin/vmd/i8259.c
> > @@ -222,20 +222,16 @@ i8259_assert_irq(uint8_t irq)
> >  {
> >   mutex_lock(_mtx);
> >   if (irq <= 7) {
> > - if (!ISSET(pics[MASTER].imr, 1 << irq)) {
> > - SET(pics[MASTER].irr, 1 << irq);
> > - pics[MASTER].asserted = 1;
> > - }
> > + SET(pics[MASTER].irr, 1 << irq);
> > + pics[MASTER].asserted = 1;
> >   } else {
> >   irq -= 8;
> > - if (!ISSET(pics[SLAVE].imr, 1 << irq)) {
> > - SET(pics[SLAVE].irr, 1 << irq);
> > - pics[SLAVE].asserted = 1;
> > + SET(pics[SLAVE].irr, 1 << irq);
> > + pics[SLAVE].asserted = 1;
> >
> > - /* Assert cascade IRQ on master PIC */
> > - SET(pics[MASTER].irr, 1 << 2);
> > - pics[MASTER].asserted = 1;
> > - }
> > + /* Assert cascade IRQ on master PIC */
> > + SET(pics[MASTER].irr, 1 << 2);
> > + pics[MASTER].asserted = 1;
> >   }
> >   mutex_unlock(_mtx);
> >  }
> > blob - 7bc76c4daed427dae82688452ec21985be679bc4
> > blob + 4d321fd4ff23514ffa7a4bee0eeb7a9324020d80
> > --- usr.sbin/vmd/vioblk.c
> > +++ usr.sbin/vmd/vioblk.c
> > @@ -39,7 +39,8 @@ static uint32_t handle_io_read(struct viodev_msg *, st
> >  extern struct vmd_vm *current_vm;
> >
> >  static const char *disk_type(int);
> > -static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev
> *);
> > +static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *,
> > +int8_t *);
> >  static int handle_io_write(struct viodev_msg *, struct virtio_dev *);
> >  void vioblk_notify_rx(struct vioblk_dev *);
> >  int vioblk_notifyq(struct vioblk_dev *);
> > @@ -723,6 +724,7 @@ handle_sync_io(int fd, short event, void *arg)
> >   struct viodev_msg msg;
> >   struct imsg imsg;
> >   ssize_t n;
> > + int8_t intr = INTR_STATE_NOOP;
> >
> >   if (event & EV_READ) {
> >   if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
> > @@ -770,8 +772,9 @@ handle_sync_io(int fd, short event, void *arg)
> >   }
> >   case VIODEV_MSG_IO_READ:
> >   /* Read IO: make sure to send a reply */
> > - msg.data = handle_io_read(, dev);
> > + msg.data = handle_io_read(, dev, );
> >   msg.data_valid = 1;
> > + msg.state = intr;
> >   imsg_compose_event(iev, IMSG_DEVOP_MSG, 0, 0, -1,
> ,
> >   sizeof(msg));
> >   break;
> > @@ -844,7 +847,7 @@ handle_io_read(struct viodev_msg *msg, struct
> virtio_d
> >  }
> >
> >  static uint32_t
> > -handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev)
> > +handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev, int8_t
> *intr)
> >  {
> >   struct vioblk_dev *vioblk = >vioblk;
> >   uint8_t sz = msg->io_sz;
> > @@ -1001,7 +1004,8 @@ handle_io_read(struct viodev_msg *msg, struct
> virtio_d

Re: vmd: fix i8259 race condition, vioblk hang

2023-08-29 Thread Dave Voutila


Dave Voutila  writes:

> mbuhl@ found an issue where the emulated virtio block device can
> hang. The tl;dr: the emulated pic never injects an interrupt and the
> vioblk(4) driver in the guest starves, waiting to be told to check the
> virtqueue.
>
> Flipping the bit in the i8259 using gdb causes the spice to flow once
> again.
>
> This diff fixes two related things (so could be committed in 2 parts):
>
> 1. the irq deassert on a virtio pci interrupt status register read
>should occur synchronously by the vcpu thread in the vm and not
>async.
>
> 2. always raise the irr bit in the pic, regardless of mask. The mask is
>used when finding an irq to ack and shouldn't be used to determine if
>the irr bit is raised. AFAICT from the old i8259 data sheets, masking
>should have no effect on if the irr bit is set.
>
> This is holding up another diff I want to share that reduces latency in
> interrupts, but it's shown to make this i8259 race condition worse, so
> I'd like to give folks a few days to check if the below diff causes
> regressions. Once this is committed, I'll feel safe sharing the latency
> diff with tech@.
>
> Any ok's pending a few days for folks to test?

Still looking for ok's or test reports (other than from mbuhl@, of
course.)

>
> -dv
>
>
> diff refs/heads/master cdba90a89ee6e77b52c2b6955d36a260a23a3b85
> commit - ad1cd1152fddbf55189657a2df9f2468409698ab
> commit + cdba90a89ee6e77b52c2b6955d36a260a23a3b85
> blob - f7862f5e9d17f96f5260358271fab8f253b26505
> blob + b6891c52c824d7b8c69e67e5323772152b1ed844
> --- usr.sbin/vmd/i8259.c
> +++ usr.sbin/vmd/i8259.c
> @@ -222,20 +222,16 @@ i8259_assert_irq(uint8_t irq)
>  {
>   mutex_lock(_mtx);
>   if (irq <= 7) {
> - if (!ISSET(pics[MASTER].imr, 1 << irq)) {
> - SET(pics[MASTER].irr, 1 << irq);
> - pics[MASTER].asserted = 1;
> - }
> + SET(pics[MASTER].irr, 1 << irq);
> + pics[MASTER].asserted = 1;
>   } else {
>   irq -= 8;
> - if (!ISSET(pics[SLAVE].imr, 1 << irq)) {
> - SET(pics[SLAVE].irr, 1 << irq);
> - pics[SLAVE].asserted = 1;
> + SET(pics[SLAVE].irr, 1 << irq);
> + pics[SLAVE].asserted = 1;
>
> - /* Assert cascade IRQ on master PIC */
> - SET(pics[MASTER].irr, 1 << 2);
> - pics[MASTER].asserted = 1;
> - }
> + /* Assert cascade IRQ on master PIC */
> + SET(pics[MASTER].irr, 1 << 2);
> + pics[MASTER].asserted = 1;
>   }
>   mutex_unlock(_mtx);
>  }
> blob - 7bc76c4daed427dae82688452ec21985be679bc4
> blob + 4d321fd4ff23514ffa7a4bee0eeb7a9324020d80
> --- usr.sbin/vmd/vioblk.c
> +++ usr.sbin/vmd/vioblk.c
> @@ -39,7 +39,8 @@ static uint32_t handle_io_read(struct viodev_msg *, st
>  extern struct vmd_vm *current_vm;
>
>  static const char *disk_type(int);
> -static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *);
> +static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *,
> +int8_t *);
>  static int handle_io_write(struct viodev_msg *, struct virtio_dev *);
>  void vioblk_notify_rx(struct vioblk_dev *);
>  int vioblk_notifyq(struct vioblk_dev *);
> @@ -723,6 +724,7 @@ handle_sync_io(int fd, short event, void *arg)
>   struct viodev_msg msg;
>   struct imsg imsg;
>   ssize_t n;
> + int8_t intr = INTR_STATE_NOOP;
>
>   if (event & EV_READ) {
>   if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
> @@ -770,8 +772,9 @@ handle_sync_io(int fd, short event, void *arg)
>   }
>   case VIODEV_MSG_IO_READ:
>   /* Read IO: make sure to send a reply */
> - msg.data = handle_io_read(, dev);
> + msg.data = handle_io_read(, dev, );
>   msg.data_valid = 1;
> + msg.state = intr;
>   imsg_compose_event(iev, IMSG_DEVOP_MSG, 0, 0, -1, ,
>   sizeof(msg));
>   break;
> @@ -844,7 +847,7 @@ handle_io_read(struct viodev_msg *msg, struct virtio_d
>  }
>
>  static uint32_t
> -handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev)
> +handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev, int8_t *intr)
>  {
>   struct vioblk_dev *vioblk = >vioblk;
>   uint8_t sz = msg->io_sz;
> @@ -1001,7 +1004,8 @@ handle_io_read(struct viodev_msg *msg, struct virtio_d
>   case VIRTIO_CONFIG_ISR_STATUS:
>   data = vioblk->cfg.isr_status;
>   vioblk->cfg.isr_status = 0;
> - virtio_deassert_pic_irq(dev, 0);
> + if (intr != NULL)
> + *intr = INTR_STATE_DEASSERT;
>   break;
>   default:
>   return (0x);
> blob - c16ad2635ea622fd7fce48b5145e2199dd451346
> blob + 

vmd: fix i8259 race condition, vioblk hang

2023-08-24 Thread Dave Voutila
mbuhl@ found an issue where the emulated virtio block device can
hang. The tl;dr: the emulated pic never injects an interrupt and the
vioblk(4) driver in the guest starves, waiting to be told to check the
virtqueue.

Flipping the bit in the i8259 using gdb causes the spice to flow once
again.

This diff fixes two related things (so could be committed in 2 parts):

1. the irq deassert on a virtio pci interrupt status register read
   should occur synchronously by the vcpu thread in the vm and not
   async.

2. always raise the irr bit in the pic, regardless of mask. The mask is
   used when finding an irq to ack and shouldn't be used to determine if
   the irr bit is raised. AFAICT from the old i8259 data sheets, masking
   should have no effect on if the irr bit is set.

This is holding up another diff I want to share that reduces latency in
interrupts, but it's shown to make this i8259 race condition worse, so
I'd like to give folks a few days to check if the below diff causes
regressions. Once this is committed, I'll feel safe sharing the latency
diff with tech@.

Any ok's pending a few days for folks to test?

-dv


diff refs/heads/master cdba90a89ee6e77b52c2b6955d36a260a23a3b85
commit - ad1cd1152fddbf55189657a2df9f2468409698ab
commit + cdba90a89ee6e77b52c2b6955d36a260a23a3b85
blob - f7862f5e9d17f96f5260358271fab8f253b26505
blob + b6891c52c824d7b8c69e67e5323772152b1ed844
--- usr.sbin/vmd/i8259.c
+++ usr.sbin/vmd/i8259.c
@@ -222,20 +222,16 @@ i8259_assert_irq(uint8_t irq)
 {
mutex_lock(_mtx);
if (irq <= 7) {
-   if (!ISSET(pics[MASTER].imr, 1 << irq)) {
-   SET(pics[MASTER].irr, 1 << irq);
-   pics[MASTER].asserted = 1;
-   }
+   SET(pics[MASTER].irr, 1 << irq);
+   pics[MASTER].asserted = 1;
} else {
irq -= 8;
-   if (!ISSET(pics[SLAVE].imr, 1 << irq)) {
-   SET(pics[SLAVE].irr, 1 << irq);
-   pics[SLAVE].asserted = 1;
+   SET(pics[SLAVE].irr, 1 << irq);
+   pics[SLAVE].asserted = 1;

-   /* Assert cascade IRQ on master PIC */
-   SET(pics[MASTER].irr, 1 << 2);
-   pics[MASTER].asserted = 1;
-   }
+   /* Assert cascade IRQ on master PIC */
+   SET(pics[MASTER].irr, 1 << 2);
+   pics[MASTER].asserted = 1;
}
mutex_unlock(_mtx);
 }
blob - 7bc76c4daed427dae82688452ec21985be679bc4
blob + 4d321fd4ff23514ffa7a4bee0eeb7a9324020d80
--- usr.sbin/vmd/vioblk.c
+++ usr.sbin/vmd/vioblk.c
@@ -39,7 +39,8 @@ static uint32_t handle_io_read(struct viodev_msg *, st
 extern struct vmd_vm *current_vm;

 static const char *disk_type(int);
-static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *);
+static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *,
+int8_t *);
 static int handle_io_write(struct viodev_msg *, struct virtio_dev *);
 void vioblk_notify_rx(struct vioblk_dev *);
 int vioblk_notifyq(struct vioblk_dev *);
@@ -723,6 +724,7 @@ handle_sync_io(int fd, short event, void *arg)
struct viodev_msg msg;
struct imsg imsg;
ssize_t n;
+   int8_t intr = INTR_STATE_NOOP;

if (event & EV_READ) {
if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
@@ -770,8 +772,9 @@ handle_sync_io(int fd, short event, void *arg)
}
case VIODEV_MSG_IO_READ:
/* Read IO: make sure to send a reply */
-   msg.data = handle_io_read(, dev);
+   msg.data = handle_io_read(, dev, );
msg.data_valid = 1;
+   msg.state = intr;
imsg_compose_event(iev, IMSG_DEVOP_MSG, 0, 0, -1, ,
sizeof(msg));
break;
@@ -844,7 +847,7 @@ handle_io_read(struct viodev_msg *msg, struct virtio_d
 }

 static uint32_t
-handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev)
+handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev, int8_t *intr)
 {
struct vioblk_dev *vioblk = >vioblk;
uint8_t sz = msg->io_sz;
@@ -1001,7 +1004,8 @@ handle_io_read(struct viodev_msg *msg, struct virtio_d
case VIRTIO_CONFIG_ISR_STATUS:
data = vioblk->cfg.isr_status;
vioblk->cfg.isr_status = 0;
-   virtio_deassert_pic_irq(dev, 0);
+   if (intr != NULL)
+   *intr = INTR_STATE_DEASSERT;
break;
default:
return (0x);
blob - c16ad2635ea622fd7fce48b5145e2199dd451346
blob + 5c2a943ac7ad02dfbc4793f5caab3200a3ced803
--- usr.sbin/vmd/vionet.c
+++ usr.sbin/vmd/vionet.c
@@ -52,7 +52,8 @@ static uint32_t handle_io_read(struct viodev_msg *, st

 static int vionet_rx(struct vionet_dev *);
 static void vionet_rx_event(int, short, void 

vmd: fix local prefix validation from vm.conf

2023-05-22 Thread Dave Voutila
As mentioned before on bugs@ [1], the custom local prefix feature of vmd
broke with some of the recent changes. Since fixing this touches a lot
of parts of vmd, I'm breaking it up to make it reviewable by others and
hopefully less daunting.

This diff fixes an existing issue (predating recent changes) with how
the vm.conf "local prefix" and "local inet6 prefix" are validated by
re-using the runtime validation logic at config parse time to catch
issues early at startup or reload.

Prior to this, someone could provide a local prefix in vm.conf like:

  local prefix 10.10.0.0/10

But they would only get an error when starting a vm with a local
interface instead of vm.conf parse time.

The beauty of this change is not only do we keep the validation
centralized, but the refactor of passing pointers to struct address
simplifies the next diff :)

OK?
-dv

[1] https://marc.info/?l=openbsd-bugs=168377086611788=2


diff 6a9ebb2fe80ab5d5e37c3efa9a90826539df8d1b 
2561c36aebbb7e0d4759a9185eb003d5c960cf99
commit - 6a9ebb2fe80ab5d5e37c3efa9a90826539df8d1b
commit + 2561c36aebbb7e0d4759a9185eb003d5c960cf99
blob - 3b8744ffe50f24d24eb51085c5291a91cd8ea983
blob + 8fbd084a0632d8ca9f2c8dba3100540de60c232c
--- usr.sbin/vmd/dhcp.c
+++ usr.sbin/vmd/dhcp.c
@@ -147,7 +147,7 @@ dhcp_request(struct virtio_dev *dev, char *buf, size_t
}

if ((client_addr.s_addr =
-   vm_priv_addr(>vmd_cfg,
+   vm_priv_addr(>vmd_cfg.cfg_localprefix,
dev->vm_vmid, vionet->idx, 1)) == 0)
return (-1);
memcpy(, _addr,
@@ -156,8 +156,8 @@ dhcp_request(struct virtio_dev *dev, char *buf, size_t
sizeof(client_addr));
ss2sin(_dst)->sin_port = htons(CLIENT_PORT);

-   if ((server_addr.s_addr = vm_priv_addr(>vmd_cfg, dev->vm_vmid,
-   vionet->idx, 0)) == 0)
+   if ((server_addr.s_addr = vm_priv_addr(>vmd_cfg.cfg_localprefix,
+   dev->vm_vmid, vionet->idx, 0)) == 0)
return (-1);
memcpy(, _addr, sizeof(server_addr));
memcpy((_src)->sin_addr, _addr,
blob - 09468e3fe2c9f4f9193710c65667132f79a90df3
blob + 1970016c324a512a0c825745a5f04033555339ef
--- usr.sbin/vmd/parse.y
+++ usr.sbin/vmd/parse.y
@@ -192,29 +192,28 @@ main  : LOCAL INET6 {
struct address   h;

if (host($4, ) == -1 ||
-   h.ss.ss_family != AF_INET6 ||
-   h.prefixlen > 64 || h.prefixlen < 0) {
+   vm_priv_addr6(, 0, 0, 1, NULL) == 0) {
yyerror("invalid local inet6 prefix: %s", $4);
-   free($4);
YYERROR;
+   } else {
+   env->vmd_cfg.cfg_flags |= VMD_CFG_INET6;
+   env->vmd_cfg.cfg_flags &= ~VMD_CFG_AUTOINET6;
+   memcpy(>vmd_cfg.cfg_localprefix6, ,
+   sizeof(h));
}
-
-   env->vmd_cfg.cfg_flags |= VMD_CFG_INET6;
-   env->vmd_cfg.cfg_flags &= ~VMD_CFG_AUTOINET6;
-   memcpy(>vmd_cfg.cfg_localprefix6, , sizeof(h));
+   free($4);
}
| LOCAL PREFIX STRING {
struct address   h;

if (host($3, ) == -1 ||
-   h.ss.ss_family != AF_INET ||
-   h.prefixlen > 32 || h.prefixlen < 0) {
+   vm_priv_addr(, 0, 0, 1) == 0) {
yyerror("invalid local prefix: %s", $3);
-   free($3);
YYERROR;
-   }
-
-   memcpy(>vmd_cfg.cfg_localprefix, , sizeof(h));
+   } else
+   memcpy(>vmd_cfg.cfg_localprefix, ,
+   sizeof(h));
+   free($3);
}
| SOCKET OWNER owner_id {
env->vmd_ps.ps_csock.cs_uid = $3.uid;
blob - a7a7a2bc2280ea85ed486d65a45ef77b2d04a8b5
blob + 6d852813dfcbc47934b25f207ec8b322f7e58014
--- usr.sbin/vmd/priv.c
+++ usr.sbin/vmd/priv.c
@@ -466,7 +466,7 @@ vm_priv_ifconfig(struct privsep *ps, struct vmd_vm *vm
sin4->sin_family = AF_INET;
sin4->sin_len = sizeof(*sin4);
if ((sin4->sin_addr.s_addr =
-   vm_priv_addr(>vmd_cfg,
+   vm_priv_addr(>vmd_cfg.cfg_localprefix,
vm->vm_vmid, i, 0)) == 0)
return (-1);

@@ -493,7 +493,7 @@ vm_priv_ifconfig(struct privsep *ps, struct vmd_vm *vm
sin6 = ss2sin6(_addr);
sin6->sin6_family = AF_INET6;
sin6->sin6_len = 

Re: vmd: fix booting 7.2 ramdisks with >= 4G mem

2022-11-28 Thread Mark Kettenis
> Date: Mon, 28 Nov 2022 18:26:47 +0100
> From: Claudio Jeker 
> 
> On Mon, Nov 28, 2022 at 11:32:32AM -0500, Dave Voutila wrote:
> > tech@ et. al.,
> > 
> > When kettenis@ introduced a newer version of BOOTARG_CONSDEV to add
> > additional params for the AMD Ryzen V1000 family, vmd's code that
> > configures bootargs to support direct booting a ramdisk kernel didn't
> > adjust with it.
> > 
> > Mischa Peters found this and shared a simple reproducer on 7.2 and
> > -current:
> > 
> > # vmctl start -c -b /bsd.rd -m 4G test
> > 
> > Where /bsd.rd is a 7.2 or -current ramdisk kernel.
> > 
> > Interestingly, this is only seen when using 4G (or more) memory for the
> > guest. I think it's just a happy coincedence it works < 4G because of
> > the resulting BOOTARG_MEMMAP sizing things so the BOOTARG_CONSDEV works.
> > 
> > Diff below fixes the issue by simply zero'ing the BOOTARG_CONSDEV
> > structure before assigning to members.
> > 
> > While here, I also cleaned up some things like using literal values that
> > could be more descriptive boot arg names and also made the arithmetic
> > explicitly use the same type (uint32_t) throughout instead of mixing it
> > with int.
> > 
> > ok?
> 
> OK claudio@

ok kettenis@ too FWIW

> PS: diff did not apply, a space got stripped below diff should apply
> -- 
> :wq Claudio
> 
> Index: loadfile_elf.c
> ===
> RCS file: /cvs/src/usr.sbin/vmd/loadfile_elf.c,v
> retrieving revision 1.42
> diff -u -p -r1.42 loadfile_elf.c
> --- loadfile_elf.c28 Jan 2022 06:33:27 -  1.42
> +++ loadfile_elf.c28 Nov 2022 17:07:20 -
> @@ -382,9 +382,10 @@ create_bios_memmap(struct vm_create_para
>   * Parameters:
>   *  memmap: the BIOS memory map
>   *  n: number of entries in memmap
> + *  bootmac: optional PXE boot MAC address
>   *
>   * Return values:
> - *  The size of the bootargs
> + *  The size of the bootargs in bytes
>   */
>  static uint32_t
>  push_bootargs(bios_memmap_t *memmap, size_t n, bios_bootmac_t *bootmac)
> @@ -393,40 +394,41 @@ push_bootargs(bios_memmap_t *memmap, siz
>   bios_consdev_t consdev;
>   uint32_t ba[1024];
>  
> - memmap_sz = 3 * sizeof(int) + n * sizeof(bios_memmap_t);
> - ba[0] = 0x0;/* memory map */
> + memmap_sz = 3 * sizeof(uint32_t) + n * sizeof(bios_memmap_t);
> + ba[0] = BOOTARG_MEMMAP;
>   ba[1] = memmap_sz;
> - ba[2] = memmap_sz;  /* next */
> + ba[2] = memmap_sz;
>   memcpy([3], memmap, n * sizeof(bios_memmap_t));
> - i = memmap_sz / sizeof(int);
> + i = memmap_sz / sizeof(uint32_t);
>  
>   /* Serial console device, COM1 @ 0x3f8 */
> - consdev.consdev = makedev(8, 0);/* com1 @ 0x3f8 */
> + memset(, 0, sizeof(consdev));
> + consdev.consdev = makedev(8, 0);
>   consdev.conspeed = 115200;
>   consdev.consaddr = 0x3f8;
> - consdev.consfreq = 0;
>  
> - consdev_sz = 3 * sizeof(int) + sizeof(bios_consdev_t);
> - ba[i] = 0x5;   /* consdev */
> + consdev_sz = 3 * sizeof(uint32_t) + sizeof(bios_consdev_t);
> + ba[i] = BOOTARG_CONSDEV;
>   ba[i + 1] = consdev_sz;
>   ba[i + 2] = consdev_sz;
>   memcpy([i + 3], , sizeof(bios_consdev_t));
> - i += consdev_sz / sizeof(int);
> + i += consdev_sz / sizeof(uint32_t);
>  
>   if (bootmac) {
> - bootmac_sz = 3 * sizeof(int) + (sizeof(bios_bootmac_t) + 3) & 
> ~3;
> - ba[i] = 0x7;   /* bootmac */
> + bootmac_sz = 3 * sizeof(uint32_t) +
> + (sizeof(bios_bootmac_t) + 3) & ~3;
> + ba[i] = BOOTARG_BOOTMAC;
>   ba[i + 1] = bootmac_sz;
>   ba[i + 2] = bootmac_sz;
>   memcpy([i + 3], bootmac, sizeof(bios_bootmac_t));
> - i += bootmac_sz / sizeof(int);
> - } 
> + i += bootmac_sz / sizeof(uint32_t);
> + }
>  
>   ba[i++] = 0x; /* BOOTARG_END */
>  
>   write_mem(BOOTARGS_PAGE, ba, PAGE_SIZE);
>  
> - return (i * sizeof(int));
> + return (i * sizeof(uint32_t));
>  }
>  
>  /*
> 
> 



Re: vmd: fix booting 7.2 ramdisks with >= 4G mem

2022-11-28 Thread Mike Larkin
On Mon, Nov 28, 2022 at 11:32:32AM -0500, Dave Voutila wrote:
> tech@ et. al.,
>
> When kettenis@ introduced a newer version of BOOTARG_CONSDEV to add
> additional params for the AMD Ryzen V1000 family, vmd's code that
> configures bootargs to support direct booting a ramdisk kernel didn't
> adjust with it.
>
> Mischa Peters found this and shared a simple reproducer on 7.2 and
> -current:
>
> # vmctl start -c -b /bsd.rd -m 4G test
>
> Where /bsd.rd is a 7.2 or -current ramdisk kernel.
>
> Interestingly, this is only seen when using 4G (or more) memory for the
> guest. I think it's just a happy coincedence it works < 4G because of
> the resulting BOOTARG_MEMMAP sizing things so the BOOTARG_CONSDEV works.
>
> Diff below fixes the issue by simply zero'ing the BOOTARG_CONSDEV
> structure before assigning to members.
>
> While here, I also cleaned up some things like using literal values that
> could be more descriptive boot arg names and also made the arithmetic
> explicitly use the same type (uint32_t) throughout instead of mixing it
> with int.
>
> ok?
>

ok mlarkin

> -dv
>
> diff refs/heads/master refs/heads/vmd-ramdisk
> commit - 8cbcfb178c36f28f6fcb28289719a4f0547eabb4
> commit + 0be12dfaa063ded82837d3a6b2ce8df7ea7e1c2d
> blob - b367721e32b61892955bbf835b873034875c85ec
> blob + d560b8e8eb2cdd87a60c63e8ecb7fed56e5c60dc
> --- usr.sbin/vmd/loadfile_elf.c
> +++ usr.sbin/vmd/loadfile_elf.c
> @@ -382,9 +382,10 @@ create_bios_memmap(struct vm_create_params *vcp, bios_
>   * Parameters:
>   *  memmap: the BIOS memory map
>   *  n: number of entries in memmap
> + *  bootmac: optional PXE boot MAC address
>   *
>   * Return values:
> - *  The size of the bootargs
> + *  The size of the bootargs in bytes
>   */
>  static uint32_t
>  push_bootargs(bios_memmap_t *memmap, size_t n, bios_bootmac_t *bootmac)
> @@ -393,40 +394,41 @@ push_bootargs(bios_memmap_t *memmap, size_t n, bios_bo
>   bios_consdev_t consdev;
>   uint32_t ba[1024];
>
> - memmap_sz = 3 * sizeof(int) + n * sizeof(bios_memmap_t);
> - ba[0] = 0x0;/* memory map */
> + memmap_sz = 3 * sizeof(uint32_t) + n * sizeof(bios_memmap_t);
> + ba[0] = BOOTARG_MEMMAP;
>   ba[1] = memmap_sz;
> - ba[2] = memmap_sz;  /* next */
> + ba[2] = memmap_sz;
>   memcpy([3], memmap, n * sizeof(bios_memmap_t));
> - i = memmap_sz / sizeof(int);
> + i = memmap_sz / sizeof(uint32_t);
>
>   /* Serial console device, COM1 @ 0x3f8 */
> - consdev.consdev = makedev(8, 0);/* com1 @ 0x3f8 */
> + memset(, 0, sizeof(consdev));
> + consdev.consdev = makedev(8, 0);
>   consdev.conspeed = 115200;
>   consdev.consaddr = 0x3f8;
> - consdev.consfreq = 0;
>
> - consdev_sz = 3 * sizeof(int) + sizeof(bios_consdev_t);
> - ba[i] = 0x5;   /* consdev */
> + consdev_sz = 3 * sizeof(uint32_t) + sizeof(bios_consdev_t);
> + ba[i] = BOOTARG_CONSDEV;
>   ba[i + 1] = consdev_sz;
>   ba[i + 2] = consdev_sz;
>   memcpy([i + 3], , sizeof(bios_consdev_t));
> - i += consdev_sz / sizeof(int);
> + i += consdev_sz / sizeof(uint32_t);
>
>   if (bootmac) {
> - bootmac_sz = 3 * sizeof(int) + (sizeof(bios_bootmac_t) + 3) & 
> ~3;
> - ba[i] = 0x7;   /* bootmac */
> + bootmac_sz = 3 * sizeof(uint32_t) +
> + (sizeof(bios_bootmac_t) + 3) & ~3;
> + ba[i] = BOOTARG_BOOTMAC;
>   ba[i + 1] = bootmac_sz;
>   ba[i + 2] = bootmac_sz;
>   memcpy([i + 3], bootmac, sizeof(bios_bootmac_t));
> - i += bootmac_sz / sizeof(int);
> + i += bootmac_sz / sizeof(uint32_t);
>   }
>
>   ba[i++] = 0x; /* BOOTARG_END */
>
>   write_mem(BOOTARGS_PAGE, ba, PAGE_SIZE);
>
> - return (i * sizeof(int));
> + return (i * sizeof(uint32_t));
>  }
>
>  /*



Re: vmd: fix booting 7.2 ramdisks with >= 4G mem

2022-11-28 Thread Claudio Jeker
On Mon, Nov 28, 2022 at 11:32:32AM -0500, Dave Voutila wrote:
> tech@ et. al.,
> 
> When kettenis@ introduced a newer version of BOOTARG_CONSDEV to add
> additional params for the AMD Ryzen V1000 family, vmd's code that
> configures bootargs to support direct booting a ramdisk kernel didn't
> adjust with it.
> 
> Mischa Peters found this and shared a simple reproducer on 7.2 and
> -current:
> 
> # vmctl start -c -b /bsd.rd -m 4G test
> 
> Where /bsd.rd is a 7.2 or -current ramdisk kernel.
> 
> Interestingly, this is only seen when using 4G (or more) memory for the
> guest. I think it's just a happy coincedence it works < 4G because of
> the resulting BOOTARG_MEMMAP sizing things so the BOOTARG_CONSDEV works.
> 
> Diff below fixes the issue by simply zero'ing the BOOTARG_CONSDEV
> structure before assigning to members.
> 
> While here, I also cleaned up some things like using literal values that
> could be more descriptive boot arg names and also made the arithmetic
> explicitly use the same type (uint32_t) throughout instead of mixing it
> with int.
> 
> ok?

OK claudio@
 
PS: diff did not apply, a space got stripped below diff should apply
-- 
:wq Claudio

Index: loadfile_elf.c
===
RCS file: /cvs/src/usr.sbin/vmd/loadfile_elf.c,v
retrieving revision 1.42
diff -u -p -r1.42 loadfile_elf.c
--- loadfile_elf.c  28 Jan 2022 06:33:27 -  1.42
+++ loadfile_elf.c  28 Nov 2022 17:07:20 -
@@ -382,9 +382,10 @@ create_bios_memmap(struct vm_create_para
  * Parameters:
  *  memmap: the BIOS memory map
  *  n: number of entries in memmap
+ *  bootmac: optional PXE boot MAC address
  *
  * Return values:
- *  The size of the bootargs
+ *  The size of the bootargs in bytes
  */
 static uint32_t
 push_bootargs(bios_memmap_t *memmap, size_t n, bios_bootmac_t *bootmac)
@@ -393,40 +394,41 @@ push_bootargs(bios_memmap_t *memmap, siz
bios_consdev_t consdev;
uint32_t ba[1024];
 
-   memmap_sz = 3 * sizeof(int) + n * sizeof(bios_memmap_t);
-   ba[0] = 0x0;/* memory map */
+   memmap_sz = 3 * sizeof(uint32_t) + n * sizeof(bios_memmap_t);
+   ba[0] = BOOTARG_MEMMAP;
ba[1] = memmap_sz;
-   ba[2] = memmap_sz;  /* next */
+   ba[2] = memmap_sz;
memcpy([3], memmap, n * sizeof(bios_memmap_t));
-   i = memmap_sz / sizeof(int);
+   i = memmap_sz / sizeof(uint32_t);
 
/* Serial console device, COM1 @ 0x3f8 */
-   consdev.consdev = makedev(8, 0);/* com1 @ 0x3f8 */
+   memset(, 0, sizeof(consdev));
+   consdev.consdev = makedev(8, 0);
consdev.conspeed = 115200;
consdev.consaddr = 0x3f8;
-   consdev.consfreq = 0;
 
-   consdev_sz = 3 * sizeof(int) + sizeof(bios_consdev_t);
-   ba[i] = 0x5;   /* consdev */
+   consdev_sz = 3 * sizeof(uint32_t) + sizeof(bios_consdev_t);
+   ba[i] = BOOTARG_CONSDEV;
ba[i + 1] = consdev_sz;
ba[i + 2] = consdev_sz;
memcpy([i + 3], , sizeof(bios_consdev_t));
-   i += consdev_sz / sizeof(int);
+   i += consdev_sz / sizeof(uint32_t);
 
if (bootmac) {
-   bootmac_sz = 3 * sizeof(int) + (sizeof(bios_bootmac_t) + 3) & 
~3;
-   ba[i] = 0x7;   /* bootmac */
+   bootmac_sz = 3 * sizeof(uint32_t) +
+   (sizeof(bios_bootmac_t) + 3) & ~3;
+   ba[i] = BOOTARG_BOOTMAC;
ba[i + 1] = bootmac_sz;
ba[i + 2] = bootmac_sz;
memcpy([i + 3], bootmac, sizeof(bios_bootmac_t));
-   i += bootmac_sz / sizeof(int);
-   } 
+   i += bootmac_sz / sizeof(uint32_t);
+   }
 
ba[i++] = 0x; /* BOOTARG_END */
 
write_mem(BOOTARGS_PAGE, ba, PAGE_SIZE);
 
-   return (i * sizeof(int));
+   return (i * sizeof(uint32_t));
 }
 
 /*



vmd: fix booting 7.2 ramdisks with >= 4G mem

2022-11-28 Thread Dave Voutila
tech@ et. al.,

When kettenis@ introduced a newer version of BOOTARG_CONSDEV to add
additional params for the AMD Ryzen V1000 family, vmd's code that
configures bootargs to support direct booting a ramdisk kernel didn't
adjust with it.

Mischa Peters found this and shared a simple reproducer on 7.2 and
-current:

# vmctl start -c -b /bsd.rd -m 4G test

Where /bsd.rd is a 7.2 or -current ramdisk kernel.

Interestingly, this is only seen when using 4G (or more) memory for the
guest. I think it's just a happy coincedence it works < 4G because of
the resulting BOOTARG_MEMMAP sizing things so the BOOTARG_CONSDEV works.

Diff below fixes the issue by simply zero'ing the BOOTARG_CONSDEV
structure before assigning to members.

While here, I also cleaned up some things like using literal values that
could be more descriptive boot arg names and also made the arithmetic
explicitly use the same type (uint32_t) throughout instead of mixing it
with int.

ok?

-dv

diff refs/heads/master refs/heads/vmd-ramdisk
commit - 8cbcfb178c36f28f6fcb28289719a4f0547eabb4
commit + 0be12dfaa063ded82837d3a6b2ce8df7ea7e1c2d
blob - b367721e32b61892955bbf835b873034875c85ec
blob + d560b8e8eb2cdd87a60c63e8ecb7fed56e5c60dc
--- usr.sbin/vmd/loadfile_elf.c
+++ usr.sbin/vmd/loadfile_elf.c
@@ -382,9 +382,10 @@ create_bios_memmap(struct vm_create_params *vcp, bios_
  * Parameters:
  *  memmap: the BIOS memory map
  *  n: number of entries in memmap
+ *  bootmac: optional PXE boot MAC address
  *
  * Return values:
- *  The size of the bootargs
+ *  The size of the bootargs in bytes
  */
 static uint32_t
 push_bootargs(bios_memmap_t *memmap, size_t n, bios_bootmac_t *bootmac)
@@ -393,40 +394,41 @@ push_bootargs(bios_memmap_t *memmap, size_t n, bios_bo
bios_consdev_t consdev;
uint32_t ba[1024];

-   memmap_sz = 3 * sizeof(int) + n * sizeof(bios_memmap_t);
-   ba[0] = 0x0;/* memory map */
+   memmap_sz = 3 * sizeof(uint32_t) + n * sizeof(bios_memmap_t);
+   ba[0] = BOOTARG_MEMMAP;
ba[1] = memmap_sz;
-   ba[2] = memmap_sz;  /* next */
+   ba[2] = memmap_sz;
memcpy([3], memmap, n * sizeof(bios_memmap_t));
-   i = memmap_sz / sizeof(int);
+   i = memmap_sz / sizeof(uint32_t);

/* Serial console device, COM1 @ 0x3f8 */
-   consdev.consdev = makedev(8, 0);/* com1 @ 0x3f8 */
+   memset(, 0, sizeof(consdev));
+   consdev.consdev = makedev(8, 0);
consdev.conspeed = 115200;
consdev.consaddr = 0x3f8;
-   consdev.consfreq = 0;

-   consdev_sz = 3 * sizeof(int) + sizeof(bios_consdev_t);
-   ba[i] = 0x5;   /* consdev */
+   consdev_sz = 3 * sizeof(uint32_t) + sizeof(bios_consdev_t);
+   ba[i] = BOOTARG_CONSDEV;
ba[i + 1] = consdev_sz;
ba[i + 2] = consdev_sz;
memcpy([i + 3], , sizeof(bios_consdev_t));
-   i += consdev_sz / sizeof(int);
+   i += consdev_sz / sizeof(uint32_t);

if (bootmac) {
-   bootmac_sz = 3 * sizeof(int) + (sizeof(bios_bootmac_t) + 3) & 
~3;
-   ba[i] = 0x7;   /* bootmac */
+   bootmac_sz = 3 * sizeof(uint32_t) +
+   (sizeof(bios_bootmac_t) + 3) & ~3;
+   ba[i] = BOOTARG_BOOTMAC;
ba[i + 1] = bootmac_sz;
ba[i + 2] = bootmac_sz;
memcpy([i + 3], bootmac, sizeof(bios_bootmac_t));
-   i += bootmac_sz / sizeof(int);
+   i += bootmac_sz / sizeof(uint32_t);
}

ba[i++] = 0x; /* BOOTARG_END */

write_mem(BOOTARGS_PAGE, ba, PAGE_SIZE);

-   return (i * sizeof(int));
+   return (i * sizeof(uint32_t));
 }

 /*



Re: vmd: fix rebooting a received vm

2022-05-07 Thread Mike Larkin
On Sat, May 07, 2022 at 07:58:15AM -0400, Dave Voutila wrote:
> tech@:
>
> Now that vmd only accounts for memory in bytes [1], this fix is a lot
> simpler!
>
> If you use the send/receive functionality and "receive" a sent vm, it
> functions as expected. However, if that vm tries to reboot, it causes
> vmd to exit. (An ipc socket is closed in some error handling and
> triggers a code path ending vmd's event loop.)
>
> The problem was two-fold (and describing it is probably longer than the
> diff itself):
>
> 1. Not un-toggling the VM_RECEIVE_STATE bit on the vm after initial
>launch, triggering "received vm" code paths upon vm reboot.
>
>vmd's "parent" and "vmm" processes *both* track known vm's. The "vmm"
>process removes the vm from its list upon a loss of the child process
>(vm reboot), but the "parent" process keeps it in the tailq and
>reuses it, knowing the vm just requires a restart. (It has to resend
>the vm to the "vmm" process, which sees it as a "new" vm, creating a
>new child process.)
>
> 2. A "received vm" comes with pre-defined memory ranges created when it
>initially booted and these are restored before the vm is resumed. The
>problem is vmd overloads the use of these memory ranges, setting the
>number of ranges to 0 and using the first range's size as a way to
>communicate "max memory" for the vm. Since a clean reboot of a vm
>results in the "parent" process triggering the "vm start" paths, it
>assumes it can use that logic to determine the max memory.
>
>Depending on if you only fix (1) above, the vm results in either
>using the default vm memory (512MB) _or_ the size of the first
>range...which is always 640KB.
>
>Contrary to popular belief, 640KB is not enough for everyone,
>especially our vm.
>
> The diff below resolves (1) in vmd.c's vm_stop() and (2) in config.c's
> config_setvm().
>
> The fact this issue has been present for awhile indicates few people use
> or care about the send/receive functionality. I want to keep the
> functionality in place for awhile longer because I've begun to
> experiment with it *and* it's helping me find other bugs in vmd(8) as
> well as vmm(4). (Expect a vmm diff shortly.)
>
> For anyone looking to test [2], the simplest approach is to create a vm
> without a disk just boot the bsd.rd ramdisk while using a memory value
> that's *not* the default 512m:
>
>   # vmctl start -c -b /bsd.rd -m 1g test
>
> Wait for it to give you the installer prompt and then send it to a file:
>
>   # vmctl send test > test.vm
>
> You should have a 1g test.vm file. Restore it:
>
>   # vmctl receive test < test.vm
>
> Connect to the console and reboot:
>
>   # vmctl console test
>   (in vm)# reboot
>
> With the diff: the vm reboots and you end up back at the installer
> prompt. `vmctl stat` shows the correct 1g max mem. Reboot at least one
> more time and confirm the same result.
>
> Without the diff: the vmd parent process will exit taking its children
> with it.
>
> ok?
>

reads ok to me, thanks for the explanation. ok mlarkin

> -dv
>
> [1] https://marc.info/?l=openbsd-tech=165151507323339=2
>
> [2] note that the vmm issue I found means this will work reliably on AMD
> hosts, but may not on Intel hosts. fix coming soon.
>
> diff refs/heads/master refs/heads/vmd-memrange
> blob - 2750be4f580896325e5a3971667c64d61231db06
> blob + cf076cdc27ceaee6e2cbb9cce5825452f0a6
> --- usr.sbin/vmd/config.c
> +++ usr.sbin/vmd/config.c
> @@ -231,6 +231,7 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, ui
>   unsigned int unit;
>   struct timeval   tv, rate, since_last;
>   struct vmop_addr_req var;
> + size_t   bytes = 0;
>
>   if (vm->vm_state & VM_STATE_RUNNING) {
>   log_warnx("%s: vm is already running", __func__);
> @@ -518,6 +519,14 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, ui
>
>   free(tapfds);
>
> + /* Collapse any memranges after the vm was sent to PROC_VMM */
> + if (vcp->vcp_nmemranges > 0) {
> + for (i = 0; i < vcp->vcp_nmemranges; i++)
> + bytes += vcp->vcp_memranges[i].vmr_size;
> + memset(>vcp_memranges, 0, sizeof(vcp->vcp_memranges));
> + vcp->vcp_nmemranges = 0;
> + vcp->vcp_memranges[0].vmr_size = bytes;
> + }
>   vm->vm_state |= VM_STATE_RUNNING;
>   return (0);
>
> blob - 4d7e7b5e613723c2166077523dd6e8b9177d6718
> blob + d5d841fd20d9f82e852e3b844ec81d9383713923
> --- usr.sbin/vmd/vmd.c
> +++ usr.sbin/vmd/vmd.c
> @@ -1162,7 +1162,8 @@ vm_stop(struct vmd_vm *vm, int keeptty, const char *ca
>   __func__, ps->ps_title[privsep_process], caller,
>   vm->vm_vmid, keeptty ? ", keeping tty open" : "");
>
> - vm->vm_state &= ~(VM_STATE_RUNNING | VM_STATE_SHUTDOWN);
> + vm->vm_state &= ~(VM_STATE_RECEIVED | VM_STATE_RUNNING
> + | VM_STATE_SHUTDOWN);
>
>   

vmd: fix rebooting a received vm

2022-05-07 Thread Dave Voutila
tech@:

Now that vmd only accounts for memory in bytes [1], this fix is a lot
simpler!

If you use the send/receive functionality and "receive" a sent vm, it
functions as expected. However, if that vm tries to reboot, it causes
vmd to exit. (An ipc socket is closed in some error handling and
triggers a code path ending vmd's event loop.)

The problem was two-fold (and describing it is probably longer than the
diff itself):

1. Not un-toggling the VM_RECEIVE_STATE bit on the vm after initial
   launch, triggering "received vm" code paths upon vm reboot.

   vmd's "parent" and "vmm" processes *both* track known vm's. The "vmm"
   process removes the vm from its list upon a loss of the child process
   (vm reboot), but the "parent" process keeps it in the tailq and
   reuses it, knowing the vm just requires a restart. (It has to resend
   the vm to the "vmm" process, which sees it as a "new" vm, creating a
   new child process.)

2. A "received vm" comes with pre-defined memory ranges created when it
   initially booted and these are restored before the vm is resumed. The
   problem is vmd overloads the use of these memory ranges, setting the
   number of ranges to 0 and using the first range's size as a way to
   communicate "max memory" for the vm. Since a clean reboot of a vm
   results in the "parent" process triggering the "vm start" paths, it
   assumes it can use that logic to determine the max memory.

   Depending on if you only fix (1) above, the vm results in either
   using the default vm memory (512MB) _or_ the size of the first
   range...which is always 640KB.

   Contrary to popular belief, 640KB is not enough for everyone,
   especially our vm.

The diff below resolves (1) in vmd.c's vm_stop() and (2) in config.c's
config_setvm().

The fact this issue has been present for awhile indicates few people use
or care about the send/receive functionality. I want to keep the
functionality in place for awhile longer because I've begun to
experiment with it *and* it's helping me find other bugs in vmd(8) as
well as vmm(4). (Expect a vmm diff shortly.)

For anyone looking to test [2], the simplest approach is to create a vm
without a disk just boot the bsd.rd ramdisk while using a memory value
that's *not* the default 512m:

  # vmctl start -c -b /bsd.rd -m 1g test

Wait for it to give you the installer prompt and then send it to a file:

  # vmctl send test > test.vm

You should have a 1g test.vm file. Restore it:

  # vmctl receive test < test.vm

Connect to the console and reboot:

  # vmctl console test
  (in vm)# reboot

With the diff: the vm reboots and you end up back at the installer
prompt. `vmctl stat` shows the correct 1g max mem. Reboot at least one
more time and confirm the same result.

Without the diff: the vmd parent process will exit taking its children
with it.

ok?

-dv

[1] https://marc.info/?l=openbsd-tech=165151507323339=2

[2] note that the vmm issue I found means this will work reliably on AMD
hosts, but may not on Intel hosts. fix coming soon.

diff refs/heads/master refs/heads/vmd-memrange
blob - 2750be4f580896325e5a3971667c64d61231db06
blob + cf076cdc27ceaee6e2cbb9cce5825452f0a6
--- usr.sbin/vmd/config.c
+++ usr.sbin/vmd/config.c
@@ -231,6 +231,7 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, ui
unsigned int unit;
struct timeval   tv, rate, since_last;
struct vmop_addr_req var;
+   size_t   bytes = 0;

if (vm->vm_state & VM_STATE_RUNNING) {
log_warnx("%s: vm is already running", __func__);
@@ -518,6 +519,14 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, ui

free(tapfds);

+   /* Collapse any memranges after the vm was sent to PROC_VMM */
+   if (vcp->vcp_nmemranges > 0) {
+   for (i = 0; i < vcp->vcp_nmemranges; i++)
+   bytes += vcp->vcp_memranges[i].vmr_size;
+   memset(>vcp_memranges, 0, sizeof(vcp->vcp_memranges));
+   vcp->vcp_nmemranges = 0;
+   vcp->vcp_memranges[0].vmr_size = bytes;
+   }
vm->vm_state |= VM_STATE_RUNNING;
return (0);

blob - 4d7e7b5e613723c2166077523dd6e8b9177d6718
blob + d5d841fd20d9f82e852e3b844ec81d9383713923
--- usr.sbin/vmd/vmd.c
+++ usr.sbin/vmd/vmd.c
@@ -1162,7 +1162,8 @@ vm_stop(struct vmd_vm *vm, int keeptty, const char *ca
__func__, ps->ps_title[privsep_process], caller,
vm->vm_vmid, keeptty ? ", keeping tty open" : "");

-   vm->vm_state &= ~(VM_STATE_RUNNING | VM_STATE_SHUTDOWN);
+   vm->vm_state &= ~(VM_STATE_RECEIVED | VM_STATE_RUNNING
+   | VM_STATE_SHUTDOWN);

user_inc(>vm_params.vmc_params, vm->vm_user, 0);
user_put(vm->vm_user);



Re: vmd: Fix grammar for random lladdr

2021-06-02 Thread Dave Voutila


Claudio Jeker writes:

> On Wed, Jun 02, 2021 at 08:24:53AM -0400, Dave Voutila wrote:
>>
>> Martin Vahlensieck writes:
>>
>> > Index: parse.y
>> > ===
>> > retrieving revision 1.56
>> > diff -u -p -r1.56 parse.y
>> > --- parse.y23 Sep 2020 19:18:18 -  1.56
>> > +++ parse.y2 Jun 2021 06:48:12 -
>> > @@ -694,6 +694,9 @@ lladdr : STRING{
>> >
>> >memcpy($$, ea, ETHER_ADDR_LEN);
>> >}
>> > +  | /* empty */ {
>> > +  memset($$, 0, ETHER_ADDR_LEN);
>> > +  }
>> >;
>> >
>> >  local : /* empty */   { $$ = 0; }
>>
>> ok dv@.
>>
>> I'll commit this later today.
>
> This is also OK claudio@

Committed. Thanks, Martin.



Re: vmd: Fix grammar for random lladdr

2021-06-02 Thread Claudio Jeker
On Wed, Jun 02, 2021 at 08:24:53AM -0400, Dave Voutila wrote:
> 
> Martin Vahlensieck writes:
> 
> > Index: parse.y
> > ===
> > retrieving revision 1.56
> > diff -u -p -r1.56 parse.y
> > --- parse.y 23 Sep 2020 19:18:18 -  1.56
> > +++ parse.y 2 Jun 2021 06:48:12 -
> > @@ -694,6 +694,9 @@ lladdr  : STRING{
> >
> > memcpy($$, ea, ETHER_ADDR_LEN);
> > }
> > +   | /* empty */ {
> > +   memset($$, 0, ETHER_ADDR_LEN);
> > +   }
> > ;
> >
> >  local  : /* empty */   { $$ = 0; }
> 
> ok dv@.
> 
> I'll commit this later today.

This is also OK claudio@ 

-- 
:wq Claudio



Re: vmd: Fix grammar for random lladdr

2021-06-02 Thread Dave Voutila


Martin Vahlensieck writes:

> Index: parse.y
> ===
> retrieving revision 1.56
> diff -u -p -r1.56 parse.y
> --- parse.y   23 Sep 2020 19:18:18 -  1.56
> +++ parse.y   2 Jun 2021 06:48:12 -
> @@ -694,6 +694,9 @@ lladdr: STRING{
>
>   memcpy($$, ea, ETHER_ADDR_LEN);
>   }
> + | /* empty */ {
> + memset($$, 0, ETHER_ADDR_LEN);
> + }
>   ;
>
>  local: /* empty */   { $$ = 0; }

ok dv@.

I'll commit this later today.



Re: vmd: Fix grammar for random lladdr

2021-06-02 Thread Martin Vahlensieck
Hi Dave

On Tue, Jun 01, 2021 at 08:23:45PM -0400, Dave Voutila wrote:
> 
> Martin Vahlensieck writes:
> 
> > Hi
> >
> > The grammar for lladdr of interfaces is according to the manpage:
> >
> >   [locked] lladdr [etheraddr]
> >
> > This implies that `locked lladdr' is OK but looking at parse.y this
> > does not seem to be the case.  Making it optional would lead to a
> > `lladdr' all by itself being valid, which I find weird.  So I copied
> > the way ifconfig does it and now the syntax is:
> >
> >   [locked] lladdr etheraddr|random
> >
> > so to have a random locked lladdr one would have to write
> >
> >   locked lladdr random
> >
> > Is this a good approach?
> 
> Part of me thinks just specifying:
> 
>   locked lladdr
> 
> should give you a random address but enable the source mac
> filtering. Having to specify "random" seems odd to me. Thoughts?
Your variant matches what should be possible according to the man
page. It is also how I tried it and discovered it didn't work.  I
don't have a preference between the two, a diff to make the syntax
in the man page work is attached.  Syntax is OK for the following
config:
vm "test" {
memory 1G
interface {lladdr f0:e4:08:ef:5f:0a}
interface {lladdr}
interface {locked lladdr}
interface {locked lladdr f0:e4:08:ef:5f:0a}
}
> 
> I see what you're saying with how ifconfig(8) does it, but it's a bit
> different in this context as there's the "locked" modifier, so it's not
> exactly the same.
> 
> I'm not sure about the man page changes regardless. Will need another
> set of eyes on the syntax.
Sure.

Thanks for the feedback!

Best,

Martin
> 
> >
> > Best,
> >
> > Martin
> >
> > Index: parse.y
> > ===
> > RCS file: /cvs/src/usr.sbin/vmd/parse.y,v
> > retrieving revision 1.56
> > diff -u -p -r1.56 parse.y
> > --- parse.y 23 Sep 2020 19:18:18 -  1.56
> > +++ parse.y 22 May 2021 07:55:18 -
> > @@ -685,14 +685,16 @@ string: STRING string 
> > {
> >  lladdr : STRING{
> > struct ether_addr *ea;
> >
> > -   if ((ea = ether_aton($1)) == NULL) {
> > +   if (strcmp($1, "random") == 0) {
> > +   memset($$, 0, ETHER_ADDR_LEN);
> > +   } else if ((ea = ether_aton($1)) != NULL) {
> > +   memcpy($$, ea, ETHER_ADDR_LEN);
> > +   } else {
> > yyerror("invalid address: %s\n", $1);
> > free($1);
> > YYERROR;
> > }
> > free($1);
> > -
> > -   memcpy($$, ea, ETHER_ADDR_LEN);
> > }
> > ;
> >
> > Index: vm.conf.5
> > ===
> > RCS file: /cvs/src/usr.sbin/vmd/vm.conf.5,v
> > retrieving revision 1.56
> > diff -u -p -r1.56 vm.conf.5
> > --- vm.conf.5   1 Mar 2021 14:27:44 -   1.56
> > +++ vm.conf.5   22 May 2021 07:55:18 -
> > @@ -237,10 +237,12 @@ The
> >  must not be longer than 15 characters or end with a digit,
> >  as described in
> >  .Xr ifconfig 8 .
> > -.It Oo Cm locked Oc Cm lladdr Op Ar etheraddr
> > +.It Oo Cm locked Oc Cm lladdr Ar etheraddr Ns | Ns Cm random
> >  Change the link layer address (MAC address) of the interface on the
> >  VM guest side.
> > -If not specified, a randomized address will be assigned by
> > +If
> > +.Cm random
> > +is specified, a randomized address will be assigned by
> >  .Xr vmd 8 .
> >  If the
> >  .Cm locked
> 

Index: parse.y
===
retrieving revision 1.56
diff -u -p -r1.56 parse.y
--- parse.y 23 Sep 2020 19:18:18 -  1.56
+++ parse.y 2 Jun 2021 06:48:12 -
@@ -694,6 +694,9 @@ lladdr  : STRING{
 
memcpy($$, ea, ETHER_ADDR_LEN);
}
+   | /* empty */ {
+   memset($$, 0, ETHER_ADDR_LEN);
+   }
;
 
 local  : /* empty */   { $$ = 0; }



Re: vmd: Fix grammar for random lladdr

2021-06-01 Thread Dave Voutila


Martin Vahlensieck writes:

> Hi
>
> The grammar for lladdr of interfaces is according to the manpage:
>
>   [locked] lladdr [etheraddr]
>
> This implies that `locked lladdr' is OK but looking at parse.y this
> does not seem to be the case.  Making it optional would lead to a
> `lladdr' all by itself being valid, which I find weird.  So I copied
> the way ifconfig does it and now the syntax is:
>
>   [locked] lladdr etheraddr|random
>
> so to have a random locked lladdr one would have to write
>
>   locked lladdr random
>
> Is this a good approach?

Part of me thinks just specifying:

  locked lladdr

should give you a random address but enable the source mac
filtering. Having to specify "random" seems odd to me. Thoughts?

I see what you're saying with how ifconfig(8) does it, but it's a bit
different in this context as there's the "locked" modifier, so it's not
exactly the same.

I'm not sure about the man page changes regardless. Will need another
set of eyes on the syntax.

>
> Best,
>
> Martin
>
> Index: parse.y
> ===
> RCS file: /cvs/src/usr.sbin/vmd/parse.y,v
> retrieving revision 1.56
> diff -u -p -r1.56 parse.y
> --- parse.y   23 Sep 2020 19:18:18 -  1.56
> +++ parse.y   22 May 2021 07:55:18 -
> @@ -685,14 +685,16 @@ string  : STRING string {
>  lladdr   : STRING{
>   struct ether_addr *ea;
>
> - if ((ea = ether_aton($1)) == NULL) {
> + if (strcmp($1, "random") == 0) {
> + memset($$, 0, ETHER_ADDR_LEN);
> + } else if ((ea = ether_aton($1)) != NULL) {
> + memcpy($$, ea, ETHER_ADDR_LEN);
> + } else {
>   yyerror("invalid address: %s\n", $1);
>   free($1);
>   YYERROR;
>   }
>   free($1);
> -
> - memcpy($$, ea, ETHER_ADDR_LEN);
>   }
>   ;
>
> Index: vm.conf.5
> ===
> RCS file: /cvs/src/usr.sbin/vmd/vm.conf.5,v
> retrieving revision 1.56
> diff -u -p -r1.56 vm.conf.5
> --- vm.conf.5 1 Mar 2021 14:27:44 -   1.56
> +++ vm.conf.5 22 May 2021 07:55:18 -
> @@ -237,10 +237,12 @@ The
>  must not be longer than 15 characters or end with a digit,
>  as described in
>  .Xr ifconfig 8 .
> -.It Oo Cm locked Oc Cm lladdr Op Ar etheraddr
> +.It Oo Cm locked Oc Cm lladdr Ar etheraddr Ns | Ns Cm random
>  Change the link layer address (MAC address) of the interface on the
>  VM guest side.
> -If not specified, a randomized address will be assigned by
> +If
> +.Cm random
> +is specified, a randomized address will be assigned by
>  .Xr vmd 8 .
>  If the
>  .Cm locked



Re: vmd: Fix grammar for random lladdr

2021-06-01 Thread Martin Vahlensieck
Ping.

On Sat, May 22, 2021 at 09:58:46AM +0200, Martin Vahlensieck wrote:
> Hi
> 
> The grammar for lladdr of interfaces is according to the manpage:
> 
>   [locked] lladdr [etheraddr]
> 
> This implies that `locked lladdr' is OK but looking at parse.y this
> does not seem to be the case.  Making it optional would lead to a
> `lladdr' all by itself being valid, which I find weird.  So I copied
> the way ifconfig does it and now the syntax is:
> 
>   [locked] lladdr etheraddr|random
> 
> so to have a random locked lladdr one would have to write
> 
>   locked lladdr random
> 
> Is this a good approach?
> 
> Best,
> 
> Martin
> 
> Index: parse.y
> ===
> RCS file: /cvs/src/usr.sbin/vmd/parse.y,v
> retrieving revision 1.56
> diff -u -p -r1.56 parse.y
> --- parse.y   23 Sep 2020 19:18:18 -  1.56
> +++ parse.y   22 May 2021 07:55:18 -
> @@ -685,14 +685,16 @@ string  : STRING string {
>  lladdr   : STRING{
>   struct ether_addr *ea;
>  
> - if ((ea = ether_aton($1)) == NULL) {
> + if (strcmp($1, "random") == 0) {
> + memset($$, 0, ETHER_ADDR_LEN);
> + } else if ((ea = ether_aton($1)) != NULL) {
> + memcpy($$, ea, ETHER_ADDR_LEN);
> + } else {
>   yyerror("invalid address: %s\n", $1);
>   free($1);
>   YYERROR;
>   }
>   free($1);
> -
> - memcpy($$, ea, ETHER_ADDR_LEN);
>   }
>   ;
>  
> Index: vm.conf.5
> ===
> RCS file: /cvs/src/usr.sbin/vmd/vm.conf.5,v
> retrieving revision 1.56
> diff -u -p -r1.56 vm.conf.5
> --- vm.conf.5 1 Mar 2021 14:27:44 -   1.56
> +++ vm.conf.5 22 May 2021 07:55:18 -
> @@ -237,10 +237,12 @@ The
>  must not be longer than 15 characters or end with a digit,
>  as described in
>  .Xr ifconfig 8 .
> -.It Oo Cm locked Oc Cm lladdr Op Ar etheraddr
> +.It Oo Cm locked Oc Cm lladdr Ar etheraddr Ns | Ns Cm random
>  Change the link layer address (MAC address) of the interface on the
>  VM guest side.
> -If not specified, a randomized address will be assigned by
> +If
> +.Cm random
> +is specified, a randomized address will be assigned by
>  .Xr vmd 8 .
>  If the
>  .Cm locked
> 



vmd: Fix grammar for random lladdr

2021-05-22 Thread Martin Vahlensieck
Hi

The grammar for lladdr of interfaces is according to the manpage:

  [locked] lladdr [etheraddr]

This implies that `locked lladdr' is OK but looking at parse.y this
does not seem to be the case.  Making it optional would lead to a
`lladdr' all by itself being valid, which I find weird.  So I copied
the way ifconfig does it and now the syntax is:

  [locked] lladdr etheraddr|random

so to have a random locked lladdr one would have to write

  locked lladdr random

Is this a good approach?

Best,

Martin

Index: parse.y
===
RCS file: /cvs/src/usr.sbin/vmd/parse.y,v
retrieving revision 1.56
diff -u -p -r1.56 parse.y
--- parse.y 23 Sep 2020 19:18:18 -  1.56
+++ parse.y 22 May 2021 07:55:18 -
@@ -685,14 +685,16 @@ string: STRING string {
 lladdr : STRING{
struct ether_addr *ea;
 
-   if ((ea = ether_aton($1)) == NULL) {
+   if (strcmp($1, "random") == 0) {
+   memset($$, 0, ETHER_ADDR_LEN);
+   } else if ((ea = ether_aton($1)) != NULL) {
+   memcpy($$, ea, ETHER_ADDR_LEN);
+   } else {
yyerror("invalid address: %s\n", $1);
free($1);
YYERROR;
}
free($1);
-
-   memcpy($$, ea, ETHER_ADDR_LEN);
}
;
 
Index: vm.conf.5
===
RCS file: /cvs/src/usr.sbin/vmd/vm.conf.5,v
retrieving revision 1.56
diff -u -p -r1.56 vm.conf.5
--- vm.conf.5   1 Mar 2021 14:27:44 -   1.56
+++ vm.conf.5   22 May 2021 07:55:18 -
@@ -237,10 +237,12 @@ The
 must not be longer than 15 characters or end with a digit,
 as described in
 .Xr ifconfig 8 .
-.It Oo Cm locked Oc Cm lladdr Op Ar etheraddr
+.It Oo Cm locked Oc Cm lladdr Ar etheraddr Ns | Ns Cm random
 Change the link layer address (MAC address) of the interface on the
 VM guest side.
-If not specified, a randomized address will be assigned by
+If
+.Cm random
+is specified, a randomized address will be assigned by
 .Xr vmd 8 .
 If the
 .Cm locked



Re: [patch] vmd: fix possible small memleak in vm_claimid() error path

2019-09-04 Thread Mike Larkin
On Thu, Aug 29, 2019 at 07:43:31PM +0200, Hiltjo Posthuma wrote:
> Hi,
> 
> This fixes a small possible memory leak in an error handling path in vmd.c
> vm_claimid().
> 
> 
> diff --git usr.sbin/vmd/vmd.c usr.sbin/vmd/vmd.c
> index 654af5974d3..81be6b356d6 100644
> --- usr.sbin/vmd/vmd.c
> +++ usr.sbin/vmd/vmd.c
> @@ -1197,6 +1197,7 @@ vm_claimid(const char *name, int uid, uint32_t *id)
>   n2i->uid = uid;
>   if (strlcpy(n2i->name, name, sizeof(n2i->name)) >= sizeof(n2i->name)) {
>   log_warnx("vm name too long");
> + free(n2i);
>   return -1;
>   }
>   TAILQ_INSERT_TAIL(env->vmd_known, n2i, entry);
> 
> -- 
> Kind regards,
> Hiltjo
> 

Thanks, committed.

-ml



[patch] vmd: fix possible small memleak in vm_claimid() error path

2019-08-29 Thread Hiltjo Posthuma
Hi,

This fixes a small possible memory leak in an error handling path in vmd.c
vm_claimid().


diff --git usr.sbin/vmd/vmd.c usr.sbin/vmd/vmd.c
index 654af5974d3..81be6b356d6 100644
--- usr.sbin/vmd/vmd.c
+++ usr.sbin/vmd/vmd.c
@@ -1197,6 +1197,7 @@ vm_claimid(const char *name, int uid, uint32_t *id)
n2i->uid = uid;
if (strlcpy(n2i->name, name, sizeof(n2i->name)) >= sizeof(n2i->name)) {
log_warnx("vm name too long");
+   free(n2i);
return -1;
}
TAILQ_INSERT_TAIL(env->vmd_known, n2i, entry);

-- 
Kind regards,
Hiltjo



VMD: Fix Failure Start

2018-01-24 Thread Carlos Cardenas
Howdy.

Attached is a patch to fix the following condition:
When attempting to start a VM from vm.conf that fails (can't allocate
resources, etc...), do not remove vm from vm list.

Reported to bugs@ by mpi@.

Comments? Ok?

+--+
Carlos
Index: config.c
===
RCS file: /home/los/cvs/src/usr.sbin/vmd/config.c,v
retrieving revision 1.40
diff -u -p -a -u -r1.40 config.c
--- config.c5 Jan 2018 13:34:52 -   1.40
+++ config.c25 Jan 2018 03:25:23 -
@@ -416,8 +416,13 @@ config_setvm(struct privsep *ps, struct 
free(tapfds);
}
 
-   log_debug("%s: calling vm_remove", __func__);
-   vm_remove(vm);
+   if (vm->vm_from_config) {
+   log_debug("%s: calling stop vm %d", __func__, vm->vm_vmid);
+   vm_stop(vm, 0);
+   } else {
+   log_debug("%s: calling remove vm %d", __func__, vm->vm_vmid);
+   vm_remove(vm);
+   }
errno = saved_errno;
if (errno == 0)
errno = EINVAL;


Re: vmd fix

2017-03-26 Thread Mike Larkin
On Sun, Mar 26, 2017 at 04:40:03PM +0200, Mark Kettenis wrote:
> There is still a bit of an issue after the last set of changes made by
> mlarkin@.  The changed get_input_data() interface takes a pointer to a
> uint32_t as an argument, but only modifies the bytes that correspond
> to the access size.  That means that if we read the value into an
> uint32_t that is allocated on the stack, because if the access size is
> less than 4 bytes, we end up with stack garbage in the variable.  This
> is a problem in the mc146818 emulation code.
> 
> The result is that seabios (sometimes) detects the wrong memory size
> and subsequently triggers the following kernel printf:
> 
>   unknown memory type 1 for GPA 0x207bffd0
> 
> Not sure what happens with the VM at that point.  It seems to be
> hanging.
> 
> Diff below fixes the issue.  As far as I can see the i8253 and i8259
> emulation code isn't affected as the uint32_t stack variable gets
> converted into a uint8_t before being used.  But perhaps we should
> initialize the stack variables there as well to prevent further
> accidents.
> 
> ok?
> 

Yep, go for it

> 
> Index: mc146818.c
> ===
> RCS file: /cvs/src/usr.sbin/vmd/mc146818.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 mc146818.c
> --- mc146818.c25 Mar 2017 22:36:53 -  1.10
> +++ mc146818.c26 Mar 2017 14:26:10 -
> @@ -249,7 +249,7 @@ vcpu_exit_mc146818(struct vm_run_params 
>   union vm_exit *vei = vrp->vrp_exit;
>   uint16_t port = vei->vei.vei_port;
>   uint8_t dir = vei->vei.vei_dir;
> - uint32_t data;
> + uint32_t data = 0;
>  
>   get_input_data(vei, );
>  
> 



vmd fix

2017-03-26 Thread Mark Kettenis
There is still a bit of an issue after the last set of changes made by
mlarkin@.  The changed get_input_data() interface takes a pointer to a
uint32_t as an argument, but only modifies the bytes that correspond
to the access size.  That means that if we read the value into an
uint32_t that is allocated on the stack, because if the access size is
less than 4 bytes, we end up with stack garbage in the variable.  This
is a problem in the mc146818 emulation code.

The result is that seabios (sometimes) detects the wrong memory size
and subsequently triggers the following kernel printf:

  unknown memory type 1 for GPA 0x207bffd0

Not sure what happens with the VM at that point.  It seems to be
hanging.

Diff below fixes the issue.  As far as I can see the i8253 and i8259
emulation code isn't affected as the uint32_t stack variable gets
converted into a uint8_t before being used.  But perhaps we should
initialize the stack variables there as well to prevent further
accidents.

ok?


Index: mc146818.c
===
RCS file: /cvs/src/usr.sbin/vmd/mc146818.c,v
retrieving revision 1.10
diff -u -p -r1.10 mc146818.c
--- mc146818.c  25 Mar 2017 22:36:53 -  1.10
+++ mc146818.c  26 Mar 2017 14:26:10 -
@@ -249,7 +249,7 @@ vcpu_exit_mc146818(struct vm_run_params 
union vm_exit *vei = vrp->vrp_exit;
uint16_t port = vei->vei.vei_port;
uint8_t dir = vei->vei.vei_dir;
-   uint32_t data;
+   uint32_t data = 0;
 
get_input_data(vei, );