Hey,

Den 2025-10-15 kl. 15:57, skrev Natalie Vock:
> When checking whether to skip certain buffers because they're protected
> by dmem.low, we're checking the effective protection of the evictee's
> cgroup, but depending on how the evictor's cgroup relates to the
> evictee's, the semantics of effective protection values change.
> 
> When testing against cgroups from different subtrees, page_counter's
> recursive protection propagates memory protection afforded to a parent
> down to the child cgroups, even if the children were not explicitly
> protected. This prevents cgroups whose parents were afforded no
> protection from stealing memory from cgroups whose parents were afforded
> more protection, without users having to explicitly propagate this
> protection.
> 
> However, if we always calculate protection from the root cgroup, this
> breaks prioritization of sibling cgroups: If one cgroup was explicitly
> protected and its siblings were not, the protected cgroup should get
> higher priority, i.e. the protected cgroup should be able to steal from
> unprotected siblings. This only works if we restrict the protection
> calculation to the subtree shared by evictor and evictee.
> 
> Signed-off-by: Natalie Vock <[email protected]>
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 
> 7f7872ab2090cc8db188e08ddfdcd12fe924f743..bc88941c0aadb9a1d6fbaa470ccdeae4f91c41fb
>  100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -524,13 +524,29 @@ struct ttm_bo_evict_walk {
>  
>  static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct 
> ttm_buffer_object *bo)
>  {
> +     struct dmem_cgroup_pool_state *limit_pool;
>       struct ttm_bo_evict_walk *evict_walk =
>               container_of(walk, typeof(*evict_walk), walk);
>       s64 lret;
>  
> -     if 
> (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state->limit_pool,
> -                                           bo->resource->css, 
> evict_walk->try_low,
> -                                           &evict_walk->hit_low))
> +     /*
> +      * If only_evict_unprotected is set, then we're trying to evict 
> unprotected
> +      * buffers in favor of a protected allocation for charge_pool. 
> Explicitly skip
> +      * buffers belonging to the same cgroup here - that cgroup is 
> definitely protected,
> +      * even though dmem_cgroup_state_evict_valuable would allow the 
> eviction because a
> +      * cgroup is always allowed to evict from itself even if it is 
> protected.
> +      */
> +     if (evict_walk->alloc_state->only_evict_unprotected &&
> +                     bo->resource->css == 
> evict_walk->alloc_state->charge_pool)
> +             return 0;
> +
> +     limit_pool = evict_walk->alloc_state->limit_pool;
> +     if (!limit_pool)
> +             limit_pool = dmem_cgroup_common_ancestor(bo->resource->css,
> +                                                      
> evict_walk->alloc_state->charge_pool);
> +
> +     if (!dmem_cgroup_state_evict_valuable(limit_pool, bo->resource->css,
> +                                           evict_walk->try_low, 
> &evict_walk->hit_low))
>               return 0;
>  
>       if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, 
> evict_walk->place))
> 
Patches themselves look good, I think it would help to add a bit more 
documentation since
cgroup related dmem eviction is already complicated, and while I believe those 
changes are
correct, it will help others to understand the code in case bugs show up.

Perhaps even add a global overview of how dmem eviction interacts with TTM 
eviction.

This will need review from the TTM maintainers/reviewers too before being 
accepted.

With the extra documentation added:
Reviewed-by: Maarten Lankhorst <[email protected]>

Reply via email to