On 02/03/2026 12:37, Natalie Vock wrote:
When the cgroup's memory usage is below the low/min limit and allocation
fails, try evicting some unprotected buffers to make space. Otherwise,
application buffers may be forced to go into GTT even though usage is
below the corresponding low/min limit, if other applications filled VRAM
with their allocations first.

Signed-off-by: Natalie Vock <[email protected]>
---
  drivers/gpu/drm/ttm/ttm_bo.c | 52 +++++++++++++++++++++++++++++++++++++++-----
  1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 53c4de4bcc1e3..86f99237f6490 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -494,6 +494,10 @@ struct ttm_bo_alloc_state {
        struct dmem_cgroup_pool_state *charge_pool;
        /** @limit_pool: Which pool limit we should test against */
        struct dmem_cgroup_pool_state *limit_pool;
+       /** @only_evict_unprotected: If only unprotected BOs, i.e. BOs whose 
cgroup
+        *  is exceeding its dmem low/min protection, should be considered for 
eviction
+        */
+       bool only_evict_unprotected;
  };
/**
@@ -598,8 +602,12 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
        evict_walk.walk.arg.trylock_only = true;
        lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
- /* One more attempt if we hit low limit? */
-       if (!lret && evict_walk.hit_low) {
+       /* If we failed to find enough BOs to evict, but we skipped over
+        * some BOs because they were covered by dmem low protection, retry
+        * evicting these protected BOs too, except if we're told not to
+        * consider protected BOs at all.
+        */
+       if (!lret && evict_walk.hit_low && !state->only_evict_unprotected) {
                evict_walk.try_low = true;
                lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
        }
@@ -620,7 +628,8 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
        } while (!lret && evict_walk.evicted);
/* We hit the low limit? Try once more */
-       if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
+       if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
+                       !state->only_evict_unprotected) {
                evict_walk.try_low = true;
                goto retry;
        }
@@ -730,7 +739,7 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object 
*bo,
                                 struct ttm_resource **res,
                                 struct ttm_bo_alloc_state *alloc_state)
  {
-       bool may_evict;
+       bool may_evict, below_low;
        int ret;
may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
@@ -749,9 +758,42 @@ static int ttm_bo_alloc_at_place(struct ttm_buffer_object 
*bo,
                return ret;
        }
+ /*
+        * cgroup protection plays a special role in eviction.
+        * Conceptually, protection of memory via the dmem cgroup controller
+        * entitles the protected cgroup to use a certain amount of memory.
+        * There are two types of protection - the 'low' limit is a
+        * "best-effort" protection, whereas the 'min' limit provides a hard
+        * guarantee that memory within the cgroup's allowance will not be
+        * evicted under any circumstance.
+        *
+        * To faithfully model this concept in TTM, we also need to take cgroup
+        * protection into account when allocating. When allocation in one
+        * place fails, TTM will default to trying other places first before
+        * evicting.
+        * If the allocation is covered by dmem cgroup protection, however,
+        * this prevents the allocation from using the memory it is "entitled"
+        * to. To make sure unprotected allocations cannot push new protected
+        * allocations out of places they are "entitled" to use, we should
+        * evict buffers not covered by any cgroup protection, if this
+        * allocation is covered by cgroup protection.
+        *
+        * Buffers covered by 'min' protection are a special case - the 'min'
+        * limit is a stronger guarantee than 'low', and thus buffers protected
+        * by 'low' but not 'min' should also be considered for eviction.
+        * Buffers protected by 'min' will never be considered for eviction
+        * anyway, so the regular eviction path should be triggered here.
+        * Buffers protected by 'low' but not 'min' will take a special
+        * eviction path that only evicts buffers covered by neither 'low' or
+        * 'min' protections.
+        */
+       may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);

It may make sense to group the two lines which "calculate" may_evict together. which would probably mean also pulling two lines below to before try charge, so that the whole logical block is not split.

+       below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
+       alloc_state->only_evict_unprotected = !may_evict && below_low;

Would it work to enable may_evict also if below_low is true, and assign below_low directly to only_evict_unprotected? I mean along the lines of:

may_evict = force_space && place->mem_type != TTM_PL_SYSTEM;
may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
alloc_state->only_evict_unprotected = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);

It would allow the if condition below to be simpler. Evict callback would remain the same I guess.

And maybe only_evict_unprotected could be renamed to "try_low" to align with the naming in there? Then in the callback the condition would be like:

        /* We hit the low limit? Try once more */
        if (!lret && evict_walk.hit_low &&
            !(evict_walk.try_low | state->try_low))
                evict_walk.try_low = true;
                goto retry;

Give or take.. Would that be more readable eg. obvious? Although I am endlessly confused how !try_low ends up being try_low = true in this condition so maybe I am mixing something up. You get my gist though? Unifying the naming and logic for easier understanding in essence if you can find some workable way in this spirit I think it is worth thinking about it.

Regards,

Tvrtko

+
        ret = ttm_resource_alloc(bo, place, res, alloc_state->charge_pool);
        if (ret) {
-               if (ret == -ENOSPC && may_evict)
+               if (ret == -ENOSPC && (may_evict || below_low))
                        ret = -EBUSY;
                return ret;
        }


Reply via email to