> Date: Mon, 2 May 2022 07:15:51 +0200
> From: Anton Lindqvist <an...@basename.se>
> 
> On Mon, May 02, 2022 at 12:32:24AM +0200, Mark Kettenis wrote:
> > > Date: Sun, 1 May 2022 20:13:57 +0200
> > > From: Anton Lindqvist <an...@basename.se>
> > > 
> > > On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote:
> > > > > Date: Tue, 19 Apr 2022 07:32:36 +0200
> > > > > From: Anton Lindqvist <an...@basename.se>
> > > > > 
> > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > > > >Synopsis:  bse: null dereference in genet_rxintr()
> > > > > > >Category:  arm64
> > > > > > >Environment:
> > > > > >     System      : OpenBSD 7.1
> > > > > >     Details     : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > > > 06:55:12 MDT 2022
> > > > > >                     
> > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > > > 
> > > > > >     Architecture: OpenBSD.arm64
> > > > > >     Machine     : arm64
> > > > > > >Description:
> > > > > > 
> > > > > > Booting my rpi4 often but not always causes a panic while rc(8) 
> > > > > > tries to start
> > > > > > the bse network interface:
> > > > > > 
> > > > > > panic: attempt to access user address 0x38 from EL1
> > > > > > Stopped at      panic+0x160:    cmp     w21, #0x0
> > > > > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > > > > *     0      0      0     0x10000      0x200    0K swapper
> > > > > > db_enter() at panic+0x15c
> > > > > > panic() at do_el1h_sync+0x1f8
> > > > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > > > genet_rxintr() at genet_intr+0x74
> > > > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > > > ampintc_splx() at genet_ioctl+0x158
> > > > > > genet_ioctl() at ifioctl+0x308
> > > > > > ifioctl() at nfs_boot_init+0xc0
> > > > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > > > nfs_mountroot() at main+0x464
> > > > > > main() at virtdone+0x70
> > > > > > 
> > > > > > >Fix:
> > > > > > 
> > > > > > The mbuf associated with the current index is NULL. I noticed that 
> > > > > > the NetBSD
> > > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). 
> > > > > > But even with
> > > > > > that in place the same panic still occurs. Enabling GENET_DEBUG 
> > > > > > shows that the
> > > > > > total is quite high:
> > > > > > 
> > > > > > RX pidx=0000ca07 total=51463                                        
> > > > > >                            
> > > > > > 
> > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > > > dereference will
> > > > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > > > since we
> > > > > > will start accessing mbufs cleared by the previous iteration.
> > > > > > 
> > > > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > > > capturing the
> > > > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > > > 
> > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > > > ignored by the hardware. That's why I ended up with a large amount of
> > > > > mbufs available in genet_rxintr() since the software and hardware 
> > > > > state
> > > > > was out of sync. Honoring any existing value makes the problem go away
> > > > > and matches what u-boot[1] does as well.
> > > > 
> > > > Writing to GENET_RX_DMA_PROD_INDEX works for me.  The U-Boot code says
> > > > that writing 0 doesn't work.  But even that works for me.  So I'm
> > > > puzzled.
> > > > 
> > > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where 
> > > > > probably
> > > > > carefully selected as they ensure that the rx ring is filled with at
> > > > > least the configured low watermark number of mbufs. However, instead 
> > > > > of
> > > > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > > > passed as the max argument to if_rxr_get() which will clamp the value
> > > > > anyway.
> > > > 
> > > > Well, what the code does is setting the "prod" index ahead of the
> > > > "cons" index to simulate a full ring.  And then when we (partially)
> > > > fill the ring we increase "cons" to make descriptors available to the
> > > > hardware.  This seems to work on my hardware and I've never seen the
> > > > crash you're seeing.
> > > 
> > > After further meditation I realized this only happens when netbooting
> > > through u-boot. Disabling DMA as part of the hardware reset solves the
> > > issue, at last.
> > 
> > So reseting the MAC doesn't actually reset the DMA engine...
> > 
> > Doing this should be fine.  Although the firmware really should
> > disable the DMA before handing control to the OS.  Can you file a bug
> > in the appropriate channel?
> 
> Sure, in the meantime can this go in?

Yes, that is what the "ok kettenis@" bit was about ;).

Reply via email to