> Date: Mon, 2 May 2022 07:15:51 +0200 > From: Anton Lindqvist <an...@basename.se> > > On Mon, May 02, 2022 at 12:32:24AM +0200, Mark Kettenis wrote: > > > Date: Sun, 1 May 2022 20:13:57 +0200 > > > From: Anton Lindqvist <an...@basename.se> > > > > > > On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote: > > > > > Date: Tue, 19 Apr 2022 07:32:36 +0200 > > > > > From: Anton Lindqvist <an...@basename.se> > > > > > > > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > > > > > >Synopsis: bse: null dereference in genet_rxintr() > > > > > > >Category: arm64 > > > > > > >Environment: > > > > > > System : OpenBSD 7.1 > > > > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 > > > > > > 06:55:12 MDT 2022 > > > > > > > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > > > > > > > Architecture: OpenBSD.arm64 > > > > > > Machine : arm64 > > > > > > >Description: > > > > > > > > > > > > Booting my rpi4 often but not always causes a panic while rc(8) > > > > > > tries to start > > > > > > the bse network interface: > > > > > > > > > > > > panic: attempt to access user address 0x38 from EL1 > > > > > > Stopped at panic+0x160: cmp w21, #0x0 > > > > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > > > > * 0 0 0 0x10000 0x200 0K swapper > > > > > > db_enter() at panic+0x15c > > > > > > panic() at do_el1h_sync+0x1f8 > > > > > > do_el1h_sync() at handle_el1h_sync+0x6c > > > > > > handle_el1h_sync() at genet_rxintr+0x120 > > > > > > genet_rxintr() at genet_intr+0x74 > > > > > > genet_intr() at ampintc_irq_handler+0x14c > > > > > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > > > > > arm_cpu_irq() at handle_el1h_irq+0x6c > > > > > > handle_el1h_irq() at ampintc_splx+0x80 > > > > > > ampintc_splx() at genet_ioctl+0x158 > > > > > > genet_ioctl() at ifioctl+0x308 > > > > > > ifioctl() at nfs_boot_init+0xc0 > > > > > > nfs_boot_init() at nfs_mountroot+0x3c > > > > > > nfs_mountroot() at main+0x464 > > > > > > main() at virtdone+0x70 > > > > > > > > > > > > >Fix: > > > > > > > > > > > > The mbuf associated with the current index is NULL. I noticed that > > > > > > the NetBSD > > > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). > > > > > > But even with > > > > > > that in place the same panic still occurs. Enabling GENET_DEBUG > > > > > > shows that the > > > > > > total is quite high: > > > > > > > > > > > > RX pidx=0000ca07 total=51463 > > > > > > > > > > > > > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null > > > > > > dereference will > > > > > > still happen after doing more than 256 iterations in genet_rxintr() > > > > > > since we > > > > > > will start accessing mbufs cleared by the previous iteration. > > > > > > > > > > > > Here's a diff with what I've tried so far. The KASSERT() is just > > > > > > capturing the > > > > > > problem at an earlier stage. Any pointers would be much appreciated. > > > > > > > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > > > > > ignored by the hardware. That's why I ended up with a large amount of > > > > > mbufs available in genet_rxintr() since the software and hardware > > > > > state > > > > > was out of sync. Honoring any existing value makes the problem go away > > > > > and matches what u-boot[1] does as well. > > > > > > > > Writing to GENET_RX_DMA_PROD_INDEX works for me. The U-Boot code says > > > > that writing 0 doesn't work. But even that works for me. So I'm > > > > puzzled. > > > > > > > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where > > > > > probably > > > > > carefully selected as they ensure that the rx ring is filled with at > > > > > least the configured low watermark number of mbufs. However, instead > > > > > of > > > > > being forced to ensure a pidx - cidx delta above 0 on the first > > > > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > > > > > passed as the max argument to if_rxr_get() which will clamp the value > > > > > anyway. > > > > > > > > Well, what the code does is setting the "prod" index ahead of the > > > > "cons" index to simulate a full ring. And then when we (partially) > > > > fill the ring we increase "cons" to make descriptors available to the > > > > hardware. This seems to work on my hardware and I've never seen the > > > > crash you're seeing. > > > > > > After further meditation I realized this only happens when netbooting > > > through u-boot. Disabling DMA as part of the hardware reset solves the > > > issue, at last. > > > > So reseting the MAC doesn't actually reset the DMA engine... > > > > Doing this should be fine. Although the firmware really should > > disable the DMA before handing control to the OS. Can you file a bug > > in the appropriate channel? > > Sure, in the meantime can this go in?
Yes, that is what the "ok kettenis@" bit was about ;).