Martin Husemann writes: > On Fri, Sep 29, 2023 at 09:52:42AM +0000, Chavdar Ivanov wrote: > > Sep 29 01:53:13 ymir /netbsd: [ 228407.9443196] panic: kernel diagnostic > > assertion "offset < map->dm_mapsize" failed: file > > "/home/sysbuild/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= > > 0x0 > [..] > > Sep 29 01:53:13 ymir /netbsd: [ 228407.9543802] bus_dmamap_sync() at > > netbsd:bus_dmamap_sync+0x326 > > Sep 29 01:53:13 ymir /netbsd: [ 228407.9543802] rge_rxeof() at > > netbsd:rge_rxeof+0x179 > > This is a bug in the rge(4) driver (unrelated to userland resource usage > by the build), maybe a race triggered more easily when the system is > under heavey load.
hmm, this seems like corruption to me. > bus_dma.c", line 826 bad offset 0x0 >= 0x0 says that offset == 0 (which is right, this seem to this call): 1241 /* Invalidate the RX mbuf and unload its map. */ 1242 bus_dmamap_sync(sc->sc_dmat, rxq->rxq_dmamap, 0, 1243 rxq->rxq_dmamap->dm_mapsize, BUS_DMASYNC_POSTREAD); offset is the 0 / 3rd arg here, but the *second* 0x0 value here seems to be corrupted, and shouldn't be zero. ie, there's no case where it will create a zero-length dma map, it should always be either RGE_TX_LIST_SZ, RGE_RX_LIST_SZ, or RGE_JUMBO_FRAMELEN, so for this assert to trigger saying the passed offset is beyond the mapping, because the mapping is zero length, seems to be pretty clear that the bus_dmamap_t has been corrupted. the timing does seem to indicate that a problem with out of memory may be relevant here..oh, i think i may see a problem. 1110 rge_newbuf(struct rge_softc *sc, int idx) ... 1126 if (bus_dmamap_load_mbuf(sc->sc_dmat, rxmap, m, BUS_DMA_NOWAIT)) 1127 goto out; ... 1151 out: 1152 if (m != NULL) 1153 m_freem(m); 1154 return (ENOMEM); so, if bus_dmamap_load_mbuf() fails, we return ENOMEM, not ENOBUFS. however, the callers only consider ENOBUFS as an error case: 1176 rge_rx_list_init(struct rge_softc *sc) ... 1184 if (rge_newbuf(sc, i) == ENOBUFS) 1185 return (ENOBUFS); and 1212 rge_rxeof(struct rge_softc *sc) ... 1271 if (rge_newbuf(sc, i) == ENOBUFS) { so in this case, the code thinks a buffer was allocated, but it wasn't... i haven't gone deeping into what this may cause the code to do wrong yet, but it seems problematic. certainly, both callers should check for != 0, not == ENOBUFS, to avoid this problem. .mrg.