> Date: Tue, 2 Sep 2025 15:49:43 +0200 > From: Stefan Sperling <s...@stsp.name> > > On Tue, Sep 02, 2025 at 11:59:52AM +0200, Mark Kettenis wrote: > > Unfortunately you can put qwx(4) in systems that have memory above the > > 36-bit address boundary. Any x86 system with 64GB will have that, and > > even with less memory we can't be sure. So using BUS_DMA_64BIT for > > the DMA memory allocation isn't safe. > > > > However for you should be able to use bus_dmamem_alloc_range() with > > the appropriate upper limit for those allocations where the hardware > > is happy with a 36-bit limit. > > Implementing this only for amd64 won't help x13s users. > Would we make this decision on a per-arch basis, or somehow else?
The limits are set by the qwx(4) hardware, not the architecture of the machine you plug it into. So the limits you would pass in the bus_dmamem_alloc_range() would not depend on the architecture. > > > I recall patrick@ clamping this driver for 4GB early on to fix some > > > problem related to loading the firmware. This was done before qwx even > > > provided a working network interface. So perhaps using 64-bit DMA for > > > packets is fine even on arm64? > > > > That doesn't really help us since mbufs are always allocated below > > 4GB. > > This is not about mbufs themselves, it is about descriptor rings. Well, you said "So perhaps using 64-bit DMA for *packets* is fine even on arm64?" (my emphasis). Anyway, yes, that isn't relevant here. > The driver can either create these rings on demand when the AP indicates > that it will send us packets for a specific TID, or pre-allocate 16 rings > (TID 0 - 15) even if most of those rings won't ever be used. > > Perhaps pre-allocation is the better strategy in this case? That could be a viable strategy. Memory fragmentation is a problem that is hard to fix without doing some sort of pre-allocation. Even with a 36-bit DMA limit you could run into that problem. And I don't expect folks to put more than one qwx(4) into a machine, so I don't expect you'd waste a lot of memory. It would still be useful to use the relaxed 36-bit limit in the bus_dmamem_alloc_range() calls if possible. That would preserve more low memory for other purposes. The Linux driver uses dma_alloc_noncoherent() in ath11k_peer_rx_tid_setup() so maybe you can figure out which rings this applies to from that?