> Date: Tue, 2 Sep 2025 15:49:43 +0200
> From: Stefan Sperling <s...@stsp.name>
> 
> On Tue, Sep 02, 2025 at 11:59:52AM +0200, Mark Kettenis wrote:
> > Unfortunately you can put qwx(4) in systems that have memory above the
> > 36-bit address boundary.  Any x86 system with 64GB will have that, and
> > even with less memory we can't be sure.  So using BUS_DMA_64BIT for
> > the DMA memory allocation isn't safe.
> > 
> > However for you should be able to use bus_dmamem_alloc_range() with
> > the appropriate upper limit for those allocations where the hardware
> > is happy with a 36-bit limit.
> 
> Implementing this only for amd64 won't help x13s users. 
> Would we make this decision on a per-arch basis, or somehow else?

The limits are set by the qwx(4) hardware, not the architecture of the
machine you plug it into.  So the limits you would pass in the
bus_dmamem_alloc_range() would not depend on the architecture.

> > > I recall patrick@ clamping this driver for 4GB early on to fix some
> > > problem related to loading the firmware. This was done before qwx even
> > > provided a working network interface. So perhaps using 64-bit DMA for
> > > packets is fine even on arm64?
> > 
> > That doesn't really help us since mbufs are always allocated below
> > 4GB.
> 
> This is not about mbufs themselves, it is about descriptor rings.

Well, you said "So perhaps using 64-bit DMA for *packets* is fine even
on arm64?"  (my emphasis).  Anyway, yes, that isn't relevant here.

> The driver can either create these rings on demand when the AP indicates
> that it will send us packets for a specific TID, or pre-allocate 16 rings
> (TID 0 - 15) even if most of those rings won't ever be used.
> 
> Perhaps pre-allocation is the better strategy in this case?

That could be a viable strategy.  Memory fragmentation is a problem
that is hard to fix without doing some sort of pre-allocation.  Even
with a 36-bit DMA limit you could run into that problem.  And I don't
expect folks to put more than one qwx(4) into a machine, so I don't
expect you'd waste a lot of memory.

It would still be useful to use the relaxed 36-bit limit in the
bus_dmamem_alloc_range() calls if possible.  That would preserve more
low memory for other purposes.  The Linux driver uses
dma_alloc_noncoherent() in ath11k_peer_rx_tid_setup() so maybe you can
figure out which rings this applies to from that?

Reply via email to