On 19.09.25 15:11, Tvrtko Ursulin wrote: > GPUs typically benefit from contiguous memory via reduced TLB pressure and > improved caching performance, where the maximum size of contiguous block > which adds a performance benefit is related to hardware design. > > TTM pool allocator by default tries (hard) to allocate up to the system > MAX_PAGE_ORDER blocks. This varies by the CPU platform and can also be > configured via Kconfig. > > If that limit was set to be higher than the GPU can make an extra use of, > lets allow the individual drivers to let TTM know over which allocation > order can the pool allocator afford to make a little bit less effort with. > > We implement this by disabling direct reclaim for those allocations, which > reduces the allocation latency and lowers the demands on the page > allocator, in cases where expending this effort is not critical for the > GPU in question. > > Signed-off-by: Tvrtko Ursulin <[email protected]> > Cc: Christian König <[email protected]> > Cc: Thadeu Lima de Souza Cascardo <[email protected]> > --- > drivers/gpu/drm/ttm/ttm_pool.c | 15 +++++++++++++-- > include/drm/ttm/ttm_pool.h | 10 ++++++++++ > 2 files changed, 23 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c > index c5eb2e28ca9d..3bf7b6bd96a3 100644 > --- a/drivers/gpu/drm/ttm/ttm_pool.c > +++ b/drivers/gpu/drm/ttm/ttm_pool.c > @@ -726,8 +726,16 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, > struct ttm_tt *tt, > > page_caching = tt->caching; > allow_pools = true; > - for (order = ttm_pool_alloc_find_order(MAX_PAGE_ORDER, alloc); > - alloc->remaining_pages; > + > + order = ttm_pool_alloc_find_order(MAX_PAGE_ORDER, alloc); > + /* > + * Do not add latency to the allocation path for allocations orders > + * device tolds us do not bring additional performance gains. > + */ > + if (order > pool->max_beneficial_order) > + gfp_flags &= ~__GFP_DIRECT_RECLAIM; > + > + for (; alloc->remaining_pages; Move that into ttm_pool_alloc_page(), the other code to adjust the gfp_flags based on the order is there as well. > order = ttm_pool_alloc_find_order(order, alloc)) { > struct ttm_pool_type *pt; > > @@ -745,6 +753,8 @@ static int __ttm_pool_alloc(struct ttm_pool *pool, struct > ttm_tt *tt, > if (!p) { > page_caching = ttm_cached; > allow_pools = false; > + if (order <= pool->max_beneficial_order) > + gfp_flags |= __GFP_DIRECT_RECLAIM; That makes this superfluous as well. > p = ttm_pool_alloc_page(pool, gfp_flags, order); > } > /* If that fails, lower the order if possible and retry. */ > @@ -1076,6 +1086,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device > *dev, > pool->nid = nid; > pool->use_dma_alloc = use_dma_alloc; > pool->use_dma32 = use_dma32; > + pool->max_beneficial_order = MAX_PAGE_ORDER; > > for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) { > for (j = 0; j < NR_PAGE_ORDERS; ++j) { > diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h > index 54cd34a6e4c0..24d3285c9aad 100644 > --- a/include/drm/ttm/ttm_pool.h > +++ b/include/drm/ttm/ttm_pool.h > @@ -66,6 +66,7 @@ struct ttm_pool_type { > * @nid: which numa node to use > * @use_dma_alloc: if coherent DMA allocations should be used > * @use_dma32: if GFP_DMA32 should be used > + * @max_beneficial_order: allocations above this order do not bring > performance gains > * @caching: pools for each caching/order > */ > struct ttm_pool { > @@ -74,6 +75,7 @@ struct ttm_pool { > > bool use_dma_alloc; > bool use_dma32; > + unsigned int max_beneficial_order; > > struct { > struct ttm_pool_type orders[NR_PAGE_ORDERS]; > @@ -88,6 +90,14 @@ void ttm_pool_init(struct ttm_pool *pool, struct device > *dev, > int nid, bool use_dma_alloc, bool use_dma32); > void ttm_pool_fini(struct ttm_pool *pool); > > +static inline unsigned int > +ttm_pool_set_max_beneficial_order(struct ttm_pool *pool, unsigned int order) > +{ > + pool->max_beneficial_order = min(MAX_PAGE_ORDER, order); > + > + return pool->max_beneficial_order; > +} > + Just make that a parameter to ttm_pool_init(), it should be static for all devices I know about anyway. Apart from that looks good to me, Christian. > int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m); > > void ttm_pool_drop_backed_up(struct ttm_tt *tt);
