On Mon, Feb 01, 2021 at 04:18:17PM +1100, Paul Ripke wrote: > On Sun, Jan 31, 2021 at 03:32:24PM +0100, Reinoud Zandijk wrote: > > Dear Paul, > > > > On Sat, Jan 30, 2021 at 10:32:13PM +1100, Paul Ripke wrote: > > > On Sat, Jan 30, 2021 at 12:37:31AM +0100, Reinoud Zandijk wrote: > > > > On Thu, Jan 28, 2021 at 11:56:30PM +1100, Paul Ripke wrote: > > > > > Just tried running a newly built kernel on a GCE instance, and ran > > > > > into > > > > > this panic. The previously running kernel is 9.99.73 from back around > > > > > October last year. > > > > > Confirmed that a kernel built immediately prior to the following commit > > > works, > > > and fails after this commit: > > > https://github.com/NetBSD/src/commit/7bca0bcf21c9b3465a6ee4eef6c01be32c9de1eb > > > > Thats good to know; I found a bug in memory allocation that might explain > > your > > panic and committed a fix for it. Could you please try out -current and see > > if > > the problem still persists? > > Sorry, I appear to see the same behaviour with the patch: > > [ 1.0297881] virtio1 at pci0 dev 4 function 0 > [ 1.0297881] virtio1: network device (rev. 0x00) > [ 1.0297881] vioif0 at virtio1: features: > 0x20030020<EVENT_IDX,CTRL_VQ,STATUS,MAC> > [ 1.0297881] vioif0: Ethernet address 42:01:0a:98:00:02 > [ 1.0297881] panic: _bus_virt_to_bus > [ 1.0297881] cpu0: Begin traceback... > [ 1.0297881] vpanic() at netbsd:vpanic+0x156 > [ 1.0297881] snprintf() at netbsd:snprintf > [ 1.0297881] _bus_dma_alloc_bouncebuf() at netbsd:_bus_dma_alloc_bouncebuf > [ 1.0297881] bus_dmamap_load() at netbsd:bus_dmamap_load+0x9c > [ 1.0297881] vioif_dmamap_create_load.constprop.0() at > netbsd:vioif_dmamap_create_load.constprop.0+0x7e > [ 1.0297881] vioif_attach() at netbsd:vioif_attach+0x1085 > [ 1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e > [ 1.0297881] virtio_pci_rescan() at netbsd:virtio_pci_rescan+0x48 > [ 1.0297881] virtio_pci_attach() at netbsd:virtio_pci_attach+0x23a > [ 1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e > [ 1.0297881] pci_probe_device() at netbsd:pci_probe_device+0x585 > [ 1.0297881] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x1b5 > [ 1.0297881] pcirescan() at netbsd:pcirescan+0x4e > [ 1.0297881] pciattach() at netbsd:pciattach+0x186 > [ 1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e > [ 1.0297881] mp_pci_scan() at netbsd:mp_pci_scan+0x9e > [ 1.0297881] amd64_mainbus_attach() at netbsd:amd64_mainbus_attach+0x236 > [ 1.0297881] mainbus_attach() at netbsd:mainbus_attach+0x84 > [ 1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e > [ 1.0297881] cpu_configure() at netbsd:cpu_configure+0x38 > [ 1.0297881] main() at netbsd:main+0x32c > [ 1.0297881] cpu0: End traceback... > [ 1.0297881] fatal breakpoint trap in supervisor mode > [ 1.0297881] trap type 1 code 0 rip 0xffffffff80221a35 cs 0x8 rflags 0x202 > cr2 0 ilevel 0x8 rsp 0xffffffff81cfa5d0 > [ 1.0297881] curlwp 0xffffffff81886e40 pid 0.0 lowest kstack > 0xffffffff81cf52c0 > > However, forcing the full size virtio_net_hdr results in a working kernel! > Eg, the following hack: > > diff --git a/sys/dev/pci/if_vioif.c b/sys/dev/pci/if_vioif.c > index 6482f7f60742..8ff187d33a48 100644 > --- a/sys/dev/pci/if_vioif.c > +++ b/sys/dev/pci/if_vioif.c > @@ -863,7 +863,8 @@ vioif_attach(device_t parent, device_t self, void *aux) > aprint_normal_dev(self, "Ethernet address %s\n", > ether_sprintf(sc->sc_mac)); > > - if (features & (VIRTIO_NET_F_MRG_RXBUF | VIRTIO_F_VERSION_1)) { > + // if (features & (VIRTIO_NET_F_MRG_RXBUF | VIRTIO_F_VERSION_1)) { > // XXX stix > + if (1) { > sc->sc_hdr_size = sizeof(struct virtio_net_hdr); > } else { > sc->sc_hdr_size = offsetof(struct virtio_net_hdr, > num_buffers); > > Does that give any hints?
Major correction: that patch results in a *booting* kernel, but without a working NIC. I forgot I was logged on via the serial console... > > > > Could you A) test with virtio v1 PCI devices? ie without legacy and if > > > > that > > > > fails too could you B) test with src/sys/dev/pci/if_vioif.c:832 > > > > commented > > > > out > > > > and see if that makes a difference? That's a new virtio 1.0 feature that > > > > was > > > > apparently negotiated and should work in transitional devices and should > > > > not > > > > be accepted in older. It could be that CGE is making a mistake there but > > > > negotiating EVENT_IDX shifts registers so has a big impact if it goes > > > > wrong. > > > > > > A) Erm, how? Read thru some of the source and saw mentions of v1.0 vs > > > v0.9, > > > but didn't see a way of just disabling legacy support > > > > Legacy support has to be disabled in the hypervisor (like GCE) as it needs > > to > > pass a different PCI product number. In Qemu its a property of each virtio > > PCI > > device but in GCE it might be global. > > Ah, I had wondered if that was the case. I haven't seen anything in the GCE > configs to control this; Googling for answers is also made awkward given > the ambiguous "PCI" acronym. > > -- > Paul Ripke > "Great minds discuss ideas, average minds discuss events, small minds > discuss people." > -- Disputed: Often attributed to Eleanor Roosevelt. 1948. -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
