On 04/24, Mina Almasry wrote: > On Thu, Apr 24, 2025 at 3:40 PM Stanislav Fomichev <[email protected]> > wrote: > > > > On 04/24, Mina Almasry wrote: > > > On Thu, Apr 24, 2025 at 3:10 PM Stanislav Fomichev <[email protected]> > > > wrote: > > > > > > > > On 04/24, Mina Almasry wrote: > > > > > On Wed, Apr 23, 2025 at 1:15 PM Stanislav Fomichev > > > > > <[email protected]> wrote: > > > > > > > > > > > > On 04/23, Mina Almasry wrote: > > > > > > > On Wed, Apr 23, 2025 at 9:03 AM Cosmin Ratiu <[email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > > Drivers that are told to allocate RX buffers from pools of DMA > > > > > > > > memory > > > > > > > > should have enough memory in the pool to satisfy projected > > > > > > > > allocation > > > > > > > > requests (a function of ring size, MTU & other parameters). If > > > > > > > > there's > > > > > > > > not enough memory, RX ring refill might fail later at > > > > > > > > inconvenient times > > > > > > > > (e.g. during NAPI poll). > > > > > > > > > > > > > > > > > > > > > > My understanding is that if the RX ring refill fails, the driver > > > > > > > will > > > > > > > post the buffers it was able to allocate data for, and will not > > > > > > > post > > > > > > > other buffers. So it will run with a degraded performance but > > > > > > > nothing > > > > > > > overly bad should happen. This should be the same behavior if the > > > > > > > machine is under memory pressure. > > > > > > > > > > > > > > In general I don't know about this change. If the user wants to > > > > > > > use > > > > > > > very small dmabufs, they should be able to, without going through > > > > > > > hoops reducing the number of rx ring slots the driver has (if it > > > > > > > supports configuring that). > > > > > > > > > > > > > > I think maybe printing an error or warning that the dmabuf is too > > > > > > > small for the pool_size may be fine. But outright failing this > > > > > > > configuration? I don't think so. > > > > > > > > > > > > > > > This commit adds a check at dmabuf pool init time that compares > > > > > > > > the > > > > > > > > amount of memory in the underlying chunk pool (configured by > > > > > > > > the user > > > > > > > > space application providing dmabuf memory) with the desired > > > > > > > > pool size > > > > > > > > (previously set by the driver) and fails with an error message > > > > > > > > if chunk > > > > > > > > memory isn't enough. > > > > > > > > > > > > > > > > Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory > > > > > > > > provider") > > > > > > > > Signed-off-by: Cosmin Ratiu <[email protected]> > > > > > > > > --- > > > > > > > > net/core/devmem.c | 11 +++++++++++ > > > > > > > > 1 file changed, 11 insertions(+) > > > > > > > > > > > > > > > > diff --git a/net/core/devmem.c b/net/core/devmem.c > > > > > > > > index 6e27a47d0493..651cd55ebb28 100644 > > > > > > > > --- a/net/core/devmem.c > > > > > > > > +++ b/net/core/devmem.c > > > > > > > > @@ -299,6 +299,7 @@ net_devmem_bind_dmabuf(struct net_device > > > > > > > > *dev, unsigned int dmabuf_fd, > > > > > > > > int mp_dmabuf_devmem_init(struct page_pool *pool) > > > > > > > > { > > > > > > > > struct net_devmem_dmabuf_binding *binding = > > > > > > > > pool->mp_priv; > > > > > > > > + size_t size; > > > > > > > > > > > > > > > > if (!binding) > > > > > > > > return -EINVAL; > > > > > > > > @@ -312,6 +313,16 @@ int mp_dmabuf_devmem_init(struct page_pool > > > > > > > > *pool) > > > > > > > > if (pool->p.order != 0) > > > > > > > > return -E2BIG; > > > > > > > > > > > > > > > > + /* Validate that the underlying dmabuf has enough > > > > > > > > memory to satisfy > > > > > > > > + * requested pool size. > > > > > > > > + */ > > > > > > > > + size = gen_pool_size(binding->chunk_pool) >> PAGE_SHIFT; > > > > > > > > + if (size < pool->p.pool_size) { > > > > > > > > > > > > > > pool_size seems to be the number of ptr_ring slots in the > > > > > > > page_pool, > > > > > > > not some upper or lower bound on the amount of memory the > > > > > > > page_pool > > > > > > > can provide. So this check seems useless? The page_pool can still > > > > > > > not > > > > > > > provide this amount of memory with dmabuf (if the netmems aren't > > > > > > > being > > > > > > > recycled fast enough) or with normal memory (under memory > > > > > > > pressure). > > > > > > > > > > > > I read this check more as "is there enough chunks in the binding to > > > > > > fully fill in the page pool". User controls the size of rx ring > > > > > > > > > > Only on drivers that support ethtool -G, and where it will let you > > > > > configure -G to what you want. > > > > > > > > gve is the minority here, any major nic (brcm/mlx/intel) supports > > > > resizing > > > > the rings. > > > > > > > > > > GVE supports resizing rings. Other drivers may not. Even on drivers > > > that support resizing rings. Some users may have a use case for a > > > dmabuf smaller than the minimum ring size their driver accepts. > > > > > > > > > which > > > > > > controls the size of the page pool which somewhat dictates the > > > > > > minimal > > > > > > size of the binding (maybe). > > > > > > > > > > See the test I ran in the other thread. Seems at least GVE is fine > > > > > with dmabuf size < ring size. I don't know what other drivers do, but > > > > > generally speaking I think specific driver limitations should not > > > > > limit what others can do with their drivers. Sure for the GPU mem > > > > > applications you're probably looking at the dmabufs are huge and > > > > > supporting small dmabufs is not a concern, but someone somewhere may > > > > > want to run with 1 MB dmabuf for some use case and if their driver is > > > > > fine with it, core should not prevent it, I think. > > > > > > > > > > > So it's more of a sanity check. > > > > > > > > > > > > Maybe having better defaults in ncdevmem would've been a better > > > > > > option? It > > > > > > allocates (16000*4096) bytes (slightly less than 64MB, why? to fit > > > > > > into > > > > > > default /sys/module/udmabuf/parameters/size_limit_mb?) and on my > > > > > > setup > > > > > > PP wants to get 64MB at least.. > > > > > > > > > > Yeah, udmabuf has a limitation that it only supports 64MB max size > > > > > last I looked. > > > > > > > > We can use /sys/module/udmabuf/parameters/size_limit_mb to allocate > > > > more than 64MB, ncdevmem can change it. > > > > > > The udmabuf limit is hardcoded, in udmabuf.c or configured on module > > > load, and ncdevmem doesn't load udmabuf. I guess it could be changed > > > to that, but currently ncdevmem works with CONFIG_UDMABUF=y > > > > You don't need to load/reload the module to change module params: > > > > # id > > uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys) > > # cat /sys/module/udmabuf/parameters/size_limit_mb > > 64 > > # echo 128 > /sys/module/udmabuf/parameters/size_limit_mb > > # cat /sys/module/udmabuf/parameters/size_limit_mb > > 128 > > > > Today I learned! Thanks! > > I will put it on my todo list to make ncdevmem force a larger limit to
Or we can ask Cosmin to send something out? Since he's already looking into the buffer sizes..
