Hey, Den 2025-10-15 kl. 15:57, skrev Natalie Vock: > When checking whether to skip certain buffers because they're protected > by dmem.low, we're checking the effective protection of the evictee's > cgroup, but depending on how the evictor's cgroup relates to the > evictee's, the semantics of effective protection values change. > > When testing against cgroups from different subtrees, page_counter's > recursive protection propagates memory protection afforded to a parent > down to the child cgroups, even if the children were not explicitly > protected. This prevents cgroups whose parents were afforded no > protection from stealing memory from cgroups whose parents were afforded > more protection, without users having to explicitly propagate this > protection. > > However, if we always calculate protection from the root cgroup, this > breaks prioritization of sibling cgroups: If one cgroup was explicitly > protected and its siblings were not, the protected cgroup should get > higher priority, i.e. the protected cgroup should be able to steal from > unprotected siblings. This only works if we restrict the protection > calculation to the subtree shared by evictor and evictee. > > Signed-off-by: Natalie Vock <[email protected]> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 22 +++++++++++++++++++--- > 1 file changed, 19 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index > 7f7872ab2090cc8db188e08ddfdcd12fe924f743..bc88941c0aadb9a1d6fbaa470ccdeae4f91c41fb > 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -524,13 +524,29 @@ struct ttm_bo_evict_walk { > > static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct > ttm_buffer_object *bo) > { > + struct dmem_cgroup_pool_state *limit_pool; > struct ttm_bo_evict_walk *evict_walk = > container_of(walk, typeof(*evict_walk), walk); > s64 lret; > > - if > (!dmem_cgroup_state_evict_valuable(evict_walk->alloc_state->limit_pool, > - bo->resource->css, > evict_walk->try_low, > - &evict_walk->hit_low)) > + /* > + * If only_evict_unprotected is set, then we're trying to evict > unprotected > + * buffers in favor of a protected allocation for charge_pool. > Explicitly skip > + * buffers belonging to the same cgroup here - that cgroup is > definitely protected, > + * even though dmem_cgroup_state_evict_valuable would allow the > eviction because a > + * cgroup is always allowed to evict from itself even if it is > protected. > + */ > + if (evict_walk->alloc_state->only_evict_unprotected && > + bo->resource->css == > evict_walk->alloc_state->charge_pool) > + return 0; > + > + limit_pool = evict_walk->alloc_state->limit_pool; > + if (!limit_pool) > + limit_pool = dmem_cgroup_common_ancestor(bo->resource->css, > + > evict_walk->alloc_state->charge_pool); > + > + if (!dmem_cgroup_state_evict_valuable(limit_pool, bo->resource->css, > + evict_walk->try_low, > &evict_walk->hit_low)) > return 0; > > if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, > evict_walk->place)) > Patches themselves look good, I think it would help to add a bit more documentation since cgroup related dmem eviction is already complicated, and while I believe those changes are correct, it will help others to understand the code in case bugs show up.
Perhaps even add a global overview of how dmem eviction interacts with TTM eviction. This will need review from the TTM maintainers/reviewers too before being accepted. With the extra documentation added: Reviewed-by: Maarten Lankhorst <[email protected]>
