> Date: Wed, 20 Apr 2022 18:14:57 +0200
> From: Anton Lindqvist <[email protected]>
> 
> On Tue, Apr 19, 2022 at 06:07:47PM +0200, Anton Lindqvist wrote:
> > On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote:
> > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > >Synopsis:      bse: null dereference in genet_rxintr()
> > > > >Category:      arm64
> > > > >Environment:
> > > >         System      : OpenBSD 7.1
> > > >         Details     : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > 06:55:12 MDT 2022
> > > >                         
> > > > [email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > 
> > > >         Architecture: OpenBSD.arm64
> > > >         Machine     : arm64
> > > > >Description:
> > > > 
> > > > Booting my rpi4 often but not always causes a panic while rc(8) tries 
> > > > to start
> > > > the bse network interface:
> > > > 
> > > > panic: attempt to access user address 0x38 from EL1
> > > > Stopped at      panic+0x160:    cmp     w21, #0x0
> > > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > > *     0      0      0     0x10000      0x200    0K swapper
> > > > db_enter() at panic+0x15c
> > > > panic() at do_el1h_sync+0x1f8
> > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > genet_rxintr() at genet_intr+0x74
> > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > ampintc_splx() at genet_ioctl+0x158
> > > > genet_ioctl() at ifioctl+0x308
> > > > ifioctl() at nfs_boot_init+0xc0
> > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > nfs_mountroot() at main+0x464
> > > > main() at virtdone+0x70
> > > > 
> > > > >Fix:
> > > > 
> > > > The mbuf associated with the current index is NULL. I noticed that the 
> > > > NetBSD
> > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But 
> > > > even with
> > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows 
> > > > that the
> > > > total is quite high:
> > > > 
> > > > RX pidx=0000ca07 total=51463                                            
> > > >                        
> > > > 
> > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > dereference will
> > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > since we
> > > > will start accessing mbufs cleared by the previous iteration.
> > > > 
> > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > capturing the
> > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > 
> > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > ignored by the hardware. That's why I ended up with a large amount of
> > > mbufs available in genet_rxintr() since the software and hardware state
> > > was out of sync. Honoring any existing value makes the problem go away
> > > and matches what u-boot[1] does as well.
> > > 
> > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> > > carefully selected as they ensure that the rx ring is filled with at
> > > least the configured low watermark number of mbufs. However, instead of
> > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > passed as the max argument to if_rxr_get() which will clamp the value
> > > anyway.
> > > 
> > > Also, I've seen up to 8 mbufs being available per rx interrupt which is
> > > odd as only a less amount of rx ring entries are actually populated. Not
> > > sure if the driver is missing some interrupt threshold configuration.
> > > Increasing the rx ring low watermark to 8 "solved" it for now.
> > > Otherwise, the same null dereference occurs while trying to access empty
> > > mbuf ring entries.
> > > 
> > > Worth mentioning is that the NetBSD driver does not suffer from the same
> > > problem as they keep all rx ring entries populated all the time.
> > > 
> > > Looking for feedback and OKs at this point.
> > > 
> > > [1] 
> > > https://github.com/u-boot/u-boot/blob/a94ab561e2f49a80d8579930e840b810ab1a1330/drivers/net/bcmgenet.c#L404
> > 
> > While putting more pressure on the network I'm seeing up to 100 mbufs
> > being available per rx interrupt. Could it simply be explained by the
> > hardware operating under the assumption that all ring entries are
> > available? Even if instructing the hardware about the actual amount of
> > available ring entries would require the driver to keep it in sync
> > whenever the if_rxr_*() implementation decides to adjust the ring.
> > 
> > Moving to if_rxr_init(RX_DESC_COUNT, RX_DESC_COUNT) essentially making
> > all 256 ring entries always available makes the driver stable.
> 
> Here's the diff I've been running lately which is deemed to be stable.
> Changes since last time:
> 
> * Ensure all 256 ring entries are always populated. Usage of the
>   if_rxr_*() API doesn't do much at this point.
> 
> * Update the consumer index only after a ring entry has been consumed
>   and not after filling in blank ones. This should have the same effect
>   as the previous logic in genet_fill_rx_ring() but it makes our driver
>   look like the one in NetBSD, FreeBSD and Linux.

But that means we no longer provide "mclgeti" DoS mitigation on the Pi.

There has to be a way for the hardware to prevent it from completely
filling up the ring and overwrite ring entries that haven't been
processed yet by the software stack.

> 
> diff --git sys/dev/ic/bcmgenet.c sys/dev/ic/bcmgenet.c
> index 923df40bdfc..e31c0de6d95 100644
> --- sys/dev/ic/bcmgenet.c
> +++ sys/dev/ic/bcmgenet.c
> @@ -311,16 +311,13 @@ void
>  genet_fill_rx_ring(struct genet_softc *sc, int qid)
>  {
>       struct mbuf *m;
> -     uint32_t cidx, index, total;
> -     u_int slots;
> +     uint32_t index;
> +     u_int inuse, slots;
>       int error;
>  
> -     cidx = sc->sc_rx.cidx;
> -     total = (sc->sc_rx.pidx - cidx) & 0xffff;
> -     KASSERT(total <= RX_DESC_COUNT);
> -
> -     index = sc->sc_rx.cidx & (RX_DESC_COUNT - 1);
> -     for (slots = if_rxr_get(&sc->sc_rx_ring, total);
> +     inuse = if_rxr_inuse(&sc->sc_rx_ring);
> +     index = (sc->sc_rx.cidx + inuse) & (RX_DESC_COUNT - 1);
> +     for (slots = if_rxr_get(&sc->sc_rx_ring, RX_DESC_COUNT);
>            slots > 0; slots--) {
>               if ((m = genet_alloc_mbufcl(sc)) == NULL) {
>                       printf("%s: cannot allocate RX mbuf\n",
> @@ -335,16 +332,10 @@ genet_fill_rx_ring(struct genet_softc *sc, int qid)
>                       break;
>               }
>  
> -             cidx = (cidx + 1) & 0xffff;
>               index = RX_NEXT(index);
>       }
>       if_rxr_put(&sc->sc_rx_ring, slots);
>  
> -     if (sc->sc_rx.cidx != cidx) {
> -             sc->sc_rx.cidx = cidx;
> -             WR4(sc, GENET_RX_DMA_CONS_INDEX(qid), sc->sc_rx.cidx);
> -     }
> -
>       if (if_rxr_inuse(&sc->sc_rx_ring) == 0)
>               timeout_add(&sc->sc_rxto, 1);
>  }
> @@ -517,9 +508,9 @@ genet_init_rings(struct genet_softc *sc, int qid)
>  
>       /* RX ring */
>  
> -     sc->sc_rx.next = 0;
> -     sc->sc_rx.cidx = 0;
> -     sc->sc_rx.pidx = RX_DESC_COUNT;
> +     sc->sc_rx.cidx = RD4(sc, GENET_RX_DMA_CONS_INDEX(qid)) & 0xffff;
> +     sc->sc_rx.pidx = RD4(sc, GENET_RX_DMA_PROD_INDEX(qid)) & 0xffff;
> +     sc->sc_rx.next = sc->sc_rx.cidx & (RX_DESC_COUNT - 1);
>  
>       WR4(sc, GENET_RX_SCB_BURST_SIZE, 0x08);
>  
> @@ -543,7 +534,7 @@ genet_init_rings(struct genet_softc *sc, int qid)
>  
>       WR4(sc, GENET_RX_DMA_RING_CFG, __BIT(qid));     /* enable */
>  
> -     if_rxr_init(&sc->sc_rx_ring, 2, RX_DESC_COUNT);
> +     if_rxr_init(&sc->sc_rx_ring, RX_DESC_COUNT, RX_DESC_COUNT);
>       genet_fill_rx_ring(sc, qid);
>  
>       /* Enable receive DMA */
> @@ -686,7 +677,7 @@ genet_rxintr(struct genet_softc *sc, int qid)
>       uint32_t status, pidx, total;
>  
>       pidx = RD4(sc, GENET_RX_DMA_PROD_INDEX(qid)) & 0xffff;
> -     total = (pidx - sc->sc_rx.pidx) & 0xffff;
> +     total = (pidx - sc->sc_rx.cidx) & 0xffff;
>  
>       DPRINTF("RX pidx=%08x total=%d\n", pidx, total);
>  
> @@ -723,6 +714,9 @@ genet_rxintr(struct genet_softc *sc, int qid)
>               if_rxr_put(&sc->sc_rx_ring, 1);
>  
>               index = RX_NEXT(index);
> +
> +             sc->sc_rx.cidx = (sc->sc_rx.cidx + 1) & 0xffff;
> +             WR4(sc, GENET_RX_DMA_CONS_INDEX(qid), sc->sc_rx.cidx);
>       }
>  
>       if (sc->sc_rx.pidx != pidx) {
> 
> 

Reply via email to