> Date: Wed, 20 Apr 2022 18:14:57 +0200
> From: Anton Lindqvist <[email protected]>
>
> On Tue, Apr 19, 2022 at 06:07:47PM +0200, Anton Lindqvist wrote:
> > On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote:
> > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > >Synopsis: bse: null dereference in genet_rxintr()
> > > > >Category: arm64
> > > > >Environment:
> > > > System : OpenBSD 7.1
> > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21
> > > > 06:55:12 MDT 2022
> > > >
> > > > [email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > >
> > > > Architecture: OpenBSD.arm64
> > > > Machine : arm64
> > > > >Description:
> > > >
> > > > Booting my rpi4 often but not always causes a panic while rc(8) tries
> > > > to start
> > > > the bse network interface:
> > > >
> > > > panic: attempt to access user address 0x38 from EL1
> > > > Stopped at panic+0x160: cmp w21, #0x0
> > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND
> > > > * 0 0 0 0x10000 0x200 0K swapper
> > > > db_enter() at panic+0x15c
> > > > panic() at do_el1h_sync+0x1f8
> > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > genet_rxintr() at genet_intr+0x74
> > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > ampintc_splx() at genet_ioctl+0x158
> > > > genet_ioctl() at ifioctl+0x308
> > > > ifioctl() at nfs_boot_init+0xc0
> > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > nfs_mountroot() at main+0x464
> > > > main() at virtdone+0x70
> > > >
> > > > >Fix:
> > > >
> > > > The mbuf associated with the current index is NULL. I noticed that the
> > > > NetBSD
> > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But
> > > > even with
> > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows
> > > > that the
> > > > total is quite high:
> > > >
> > > > RX pidx=0000ca07 total=51463
> > > >
> > > >
> > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null
> > > > dereference will
> > > > still happen after doing more than 256 iterations in genet_rxintr()
> > > > since we
> > > > will start accessing mbufs cleared by the previous iteration.
> > > >
> > > > Here's a diff with what I've tried so far. The KASSERT() is just
> > > > capturing the
> > > > problem at an earlier stage. Any pointers would be much appreciated.
> > >
> > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > ignored by the hardware. That's why I ended up with a large amount of
> > > mbufs available in genet_rxintr() since the software and hardware state
> > > was out of sync. Honoring any existing value makes the problem go away
> > > and matches what u-boot[1] does as well.
> > >
> > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> > > carefully selected as they ensure that the rx ring is filled with at
> > > least the configured low watermark number of mbufs. However, instead of
> > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > passed as the max argument to if_rxr_get() which will clamp the value
> > > anyway.
> > >
> > > Also, I've seen up to 8 mbufs being available per rx interrupt which is
> > > odd as only a less amount of rx ring entries are actually populated. Not
> > > sure if the driver is missing some interrupt threshold configuration.
> > > Increasing the rx ring low watermark to 8 "solved" it for now.
> > > Otherwise, the same null dereference occurs while trying to access empty
> > > mbuf ring entries.
> > >
> > > Worth mentioning is that the NetBSD driver does not suffer from the same
> > > problem as they keep all rx ring entries populated all the time.
> > >
> > > Looking for feedback and OKs at this point.
> > >
> > > [1]
> > > https://github.com/u-boot/u-boot/blob/a94ab561e2f49a80d8579930e840b810ab1a1330/drivers/net/bcmgenet.c#L404
> >
> > While putting more pressure on the network I'm seeing up to 100 mbufs
> > being available per rx interrupt. Could it simply be explained by the
> > hardware operating under the assumption that all ring entries are
> > available? Even if instructing the hardware about the actual amount of
> > available ring entries would require the driver to keep it in sync
> > whenever the if_rxr_*() implementation decides to adjust the ring.
> >
> > Moving to if_rxr_init(RX_DESC_COUNT, RX_DESC_COUNT) essentially making
> > all 256 ring entries always available makes the driver stable.
>
> Here's the diff I've been running lately which is deemed to be stable.
> Changes since last time:
>
> * Ensure all 256 ring entries are always populated. Usage of the
> if_rxr_*() API doesn't do much at this point.
>
> * Update the consumer index only after a ring entry has been consumed
> and not after filling in blank ones. This should have the same effect
> as the previous logic in genet_fill_rx_ring() but it makes our driver
> look like the one in NetBSD, FreeBSD and Linux.
But that means we no longer provide "mclgeti" DoS mitigation on the Pi.
There has to be a way for the hardware to prevent it from completely
filling up the ring and overwrite ring entries that haven't been
processed yet by the software stack.
>
> diff --git sys/dev/ic/bcmgenet.c sys/dev/ic/bcmgenet.c
> index 923df40bdfc..e31c0de6d95 100644
> --- sys/dev/ic/bcmgenet.c
> +++ sys/dev/ic/bcmgenet.c
> @@ -311,16 +311,13 @@ void
> genet_fill_rx_ring(struct genet_softc *sc, int qid)
> {
> struct mbuf *m;
> - uint32_t cidx, index, total;
> - u_int slots;
> + uint32_t index;
> + u_int inuse, slots;
> int error;
>
> - cidx = sc->sc_rx.cidx;
> - total = (sc->sc_rx.pidx - cidx) & 0xffff;
> - KASSERT(total <= RX_DESC_COUNT);
> -
> - index = sc->sc_rx.cidx & (RX_DESC_COUNT - 1);
> - for (slots = if_rxr_get(&sc->sc_rx_ring, total);
> + inuse = if_rxr_inuse(&sc->sc_rx_ring);
> + index = (sc->sc_rx.cidx + inuse) & (RX_DESC_COUNT - 1);
> + for (slots = if_rxr_get(&sc->sc_rx_ring, RX_DESC_COUNT);
> slots > 0; slots--) {
> if ((m = genet_alloc_mbufcl(sc)) == NULL) {
> printf("%s: cannot allocate RX mbuf\n",
> @@ -335,16 +332,10 @@ genet_fill_rx_ring(struct genet_softc *sc, int qid)
> break;
> }
>
> - cidx = (cidx + 1) & 0xffff;
> index = RX_NEXT(index);
> }
> if_rxr_put(&sc->sc_rx_ring, slots);
>
> - if (sc->sc_rx.cidx != cidx) {
> - sc->sc_rx.cidx = cidx;
> - WR4(sc, GENET_RX_DMA_CONS_INDEX(qid), sc->sc_rx.cidx);
> - }
> -
> if (if_rxr_inuse(&sc->sc_rx_ring) == 0)
> timeout_add(&sc->sc_rxto, 1);
> }
> @@ -517,9 +508,9 @@ genet_init_rings(struct genet_softc *sc, int qid)
>
> /* RX ring */
>
> - sc->sc_rx.next = 0;
> - sc->sc_rx.cidx = 0;
> - sc->sc_rx.pidx = RX_DESC_COUNT;
> + sc->sc_rx.cidx = RD4(sc, GENET_RX_DMA_CONS_INDEX(qid)) & 0xffff;
> + sc->sc_rx.pidx = RD4(sc, GENET_RX_DMA_PROD_INDEX(qid)) & 0xffff;
> + sc->sc_rx.next = sc->sc_rx.cidx & (RX_DESC_COUNT - 1);
>
> WR4(sc, GENET_RX_SCB_BURST_SIZE, 0x08);
>
> @@ -543,7 +534,7 @@ genet_init_rings(struct genet_softc *sc, int qid)
>
> WR4(sc, GENET_RX_DMA_RING_CFG, __BIT(qid)); /* enable */
>
> - if_rxr_init(&sc->sc_rx_ring, 2, RX_DESC_COUNT);
> + if_rxr_init(&sc->sc_rx_ring, RX_DESC_COUNT, RX_DESC_COUNT);
> genet_fill_rx_ring(sc, qid);
>
> /* Enable receive DMA */
> @@ -686,7 +677,7 @@ genet_rxintr(struct genet_softc *sc, int qid)
> uint32_t status, pidx, total;
>
> pidx = RD4(sc, GENET_RX_DMA_PROD_INDEX(qid)) & 0xffff;
> - total = (pidx - sc->sc_rx.pidx) & 0xffff;
> + total = (pidx - sc->sc_rx.cidx) & 0xffff;
>
> DPRINTF("RX pidx=%08x total=%d\n", pidx, total);
>
> @@ -723,6 +714,9 @@ genet_rxintr(struct genet_softc *sc, int qid)
> if_rxr_put(&sc->sc_rx_ring, 1);
>
> index = RX_NEXT(index);
> +
> + sc->sc_rx.cidx = (sc->sc_rx.cidx + 1) & 0xffff;
> + WR4(sc, GENET_RX_DMA_CONS_INDEX(qid), sc->sc_rx.cidx);
> }
>
> if (sc->sc_rx.pidx != pidx) {
>
>