Re: panic: _bus_virt_to_bus for vioif on GCE with GENERIC kernel

Paul Ripke Sun, 31 Jan 2021 23:46:41 -0800

On Mon, Feb 01, 2021 at 04:18:17PM +1100, Paul Ripke wrote:
> On Sun, Jan 31, 2021 at 03:32:24PM +0100, Reinoud Zandijk wrote:
> > Dear Paul,
> > 
> > On Sat, Jan 30, 2021 at 10:32:13PM +1100, Paul Ripke wrote:
> > > On Sat, Jan 30, 2021 at 12:37:31AM +0100, Reinoud Zandijk wrote:
> > > > On Thu, Jan 28, 2021 at 11:56:30PM +1100, Paul Ripke wrote:
> > > > > Just tried running a newly built kernel on a GCE instance, and ran 
> > > > > into
> > > > > this panic. The previously running kernel is 9.99.73 from back around
> > > > > October last year.
> > 
> > > Confirmed that a kernel built immediately prior to the following commit
> > > works,
> > > and fails after this commit:
> > > https://github.com/NetBSD/src/commit/7bca0bcf21c9b3465a6ee4eef6c01be32c9de1eb
> > 
> > Thats good to know; I found a bug in memory allocation that might explain 
> > your
> > panic and committed a fix for it. Could you please try out -current and see 
> > if
> > the problem still persists?
> 
> Sorry, I appear to see the same behaviour with the patch:
> 
> [   1.0297881] virtio1 at pci0 dev 4 function 0
> [   1.0297881] virtio1: network device (rev. 0x00)
> [   1.0297881] vioif0 at virtio1: features: 
> 0x20030020<EVENT_IDX,CTRL_VQ,STATUS,MAC>
> [   1.0297881] vioif0: Ethernet address 42:01:0a:98:00:02
> [   1.0297881] panic: _bus_virt_to_bus
> [   1.0297881] cpu0: Begin traceback...
> [   1.0297881] vpanic() at netbsd:vpanic+0x156
> [   1.0297881] snprintf() at netbsd:snprintf
> [   1.0297881] _bus_dma_alloc_bouncebuf() at netbsd:_bus_dma_alloc_bouncebuf
> [   1.0297881] bus_dmamap_load() at netbsd:bus_dmamap_load+0x9c
> [   1.0297881] vioif_dmamap_create_load.constprop.0() at 
> netbsd:vioif_dmamap_create_load.constprop.0+0x7e
> [   1.0297881] vioif_attach() at netbsd:vioif_attach+0x1085
> [   1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e
> [   1.0297881] virtio_pci_rescan() at netbsd:virtio_pci_rescan+0x48
> [   1.0297881] virtio_pci_attach() at netbsd:virtio_pci_attach+0x23a
> [   1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e
> [   1.0297881] pci_probe_device() at netbsd:pci_probe_device+0x585
> [   1.0297881] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x1b5
> [   1.0297881] pcirescan() at netbsd:pcirescan+0x4e
> [   1.0297881] pciattach() at netbsd:pciattach+0x186
> [   1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e
> [   1.0297881] mp_pci_scan() at netbsd:mp_pci_scan+0x9e
> [   1.0297881] amd64_mainbus_attach() at netbsd:amd64_mainbus_attach+0x236
> [   1.0297881] mainbus_attach() at netbsd:mainbus_attach+0x84
> [   1.0297881] config_attach_loc() at netbsd:config_attach_loc+0x17e
> [   1.0297881] cpu_configure() at netbsd:cpu_configure+0x38
> [   1.0297881] main() at netbsd:main+0x32c
> [   1.0297881] cpu0: End traceback...
> [   1.0297881] fatal breakpoint trap in supervisor mode
> [   1.0297881] trap type 1 code 0 rip 0xffffffff80221a35 cs 0x8 rflags 0x202 
> cr2 0 ilevel 0x8 rsp 0xffffffff81cfa5d0
> [   1.0297881] curlwp 0xffffffff81886e40 pid 0.0 lowest kstack 
> 0xffffffff81cf52c0
> 
> However, forcing the full size virtio_net_hdr results in a working kernel!
> Eg, the following hack:
> 
> diff --git a/sys/dev/pci/if_vioif.c b/sys/dev/pci/if_vioif.c
> index 6482f7f60742..8ff187d33a48 100644
> --- a/sys/dev/pci/if_vioif.c
> +++ b/sys/dev/pci/if_vioif.c
> @@ -863,7 +863,8 @@ vioif_attach(device_t parent, device_t self, void *aux)
>         aprint_normal_dev(self, "Ethernet address %s\n",
>             ether_sprintf(sc->sc_mac));
>  
> -       if (features & (VIRTIO_NET_F_MRG_RXBUF | VIRTIO_F_VERSION_1)) {
> +       // if (features & (VIRTIO_NET_F_MRG_RXBUF | VIRTIO_F_VERSION_1)) {    
>   // XXX stix
> +       if (1) {
>                 sc->sc_hdr_size = sizeof(struct virtio_net_hdr);
>         } else {
>                 sc->sc_hdr_size = offsetof(struct virtio_net_hdr, 
> num_buffers);
> 
> Does that give any hints?


Major correction: that patch results in a *booting* kernel, but without a
working NIC. I forgot I was logged on via the serial console...

> > > > Could you A) test with virtio v1 PCI devices? ie without legacy and if
> > > > that
> > > > fails too could you B) test with src/sys/dev/pci/if_vioif.c:832 
> > > > commented
> > > > out
> > > > and see if that makes a difference? That's a new virtio 1.0 feature that
> > > > was
> > > > apparently negotiated and should work in transitional devices and should
> > > > not
> > > > be accepted in older. It could be that CGE is making a mistake there but
> > > > negotiating EVENT_IDX shifts registers so has a big impact if it goes
> > > > wrong.
> > > 
> > > A) Erm, how? Read thru some of the source and saw mentions of v1.0 vs 
> > > v0.9,
> > > but didn't see a way of just disabling legacy support
> > 
> > Legacy support has to be disabled in the hypervisor (like GCE) as it needs 
> > to
> > pass a different PCI product number. In Qemu its a property of each virtio 
> > PCI
> > device but in GCE it might be global.
> 
> Ah, I had wondered if that was the case. I haven't seen anything in the GCE
> configs to control this; Googling for answers is also made awkward given
> the ambiguous "PCI" acronym.
> 
> -- 
> Paul Ripke
> "Great minds discuss ideas, average minds discuss events, small minds
>  discuss people."
> -- Disputed: Often attributed to Eleanor Roosevelt. 1948.

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.

Re: panic: _bus_virt_to_bus for vioif on GCE with GENERIC kernel

Reply via email to