On 3/11/26 02:12, Chen Ridong wrote:


On 2026/3/2 20:37, Natalie Vock wrote:
Callers can use this feedback to be more aggressive in making space for
allocations of a cgroup if they know it is protected.

These are counterparts to memcg's mem_cgroup_below_{min,low}.

Signed-off-by: Natalie Vock <[email protected]>
---
  include/linux/cgroup_dmem.h | 16 ++++++++++++
  kernel/cgroup/dmem.c        | 62 +++++++++++++++++++++++++++++++++++++++++++++
  2 files changed, 78 insertions(+)

diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
index dd4869f1d736e..1a88cd0c9eb00 100644
--- a/include/linux/cgroup_dmem.h
+++ b/include/linux/cgroup_dmem.h
@@ -24,6 +24,10 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state 
*pool, u64 size);
  bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state 
*limit_pool,
                                      struct dmem_cgroup_pool_state *test_pool,
                                      bool ignore_low, bool *ret_hit_low);
+bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+                          struct dmem_cgroup_pool_state *test);
+bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+                          struct dmem_cgroup_pool_state *test);
void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
  #else
@@ -59,6 +63,18 @@ bool dmem_cgroup_state_evict_valuable(struct 
dmem_cgroup_pool_state *limit_pool,
        return true;
  }
+static inline bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+                                        struct dmem_cgroup_pool_state *test)
+{
+       return false;
+}
+
+static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+                                        struct dmem_cgroup_pool_state *test)
+{
+       return false;
+}
+
  static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state 
*pool)
  { }
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 9d95824dc6fa0..28227405f7cfe 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -694,6 +694,68 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region 
*region, u64 size,
  }
  EXPORT_SYMBOL_GPL(dmem_cgroup_try_charge);
+/**
+ * dmem_cgroup_below_min() - Tests whether current usage is within min limit.
+ *
+ * @root: Root of the subtree to calculate protection for, or NULL to 
calculate global protection.
+ * @test: The pool to test the usage/min limit of.
+ *
+ * Return: true if usage is below min and the cgroup is protected, false 
otherwise.
+ */
+bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
+                          struct dmem_cgroup_pool_state *test)
+{
+       if (root == test || !pool_parent(test))
+               return false;
+
+       if (!root) {
+               for (root = test; pool_parent(root); root = pool_parent(root))
+                       {}
+       }

It seems we don't have find the global protection(root), since the root's
protection can not be set. If !root, we can return false directly, right?

Or do I miss anything?

```
        {
                .name = "min",
                .write = dmem_cgroup_region_min_write,
                .seq_show = dmem_cgroup_region_min_show,
                .flags = CFTYPE_NOT_ON_ROOT,
        },
        {
                .name = "low",
                .write = dmem_cgroup_region_low_write,
                .seq_show = dmem_cgroup_region_low_show,
                .flags = CFTYPE_NOT_ON_ROOT,
        },
```

That's not quite how it works. You're correct that the min/low properties don't exist on the root cgroup, but we don't use the root for that.

The reason we have a root here in the first place has to do with how recursive memory protection works in cgroups. Note that for the test cgroup, we don't read the literal min/low protection setting, but the "emin"/"elow" value, referring to effective protection. The effective protection value depends not just on the settings of the "test" cgroup, but also its ancestors (and potentially, their sibling groups). See [1] for some documentation on how effective protection varies with different cgroup relationships.

The "root" parameter here refers to the root of the common subtree between the test cgroup and what the documentation refers to as the "reclaim target". For device memory there usually isn't really any reclaim happening in the traditional sense, but e.g. TTM evictions follow the same principle (the reclaim target is simply the cgroup owning the buffer that is to be evicted).

Sometimes, precise reclaim targets may not really be known yet (or we want to try evicting different buffers originating from different cgroups). In that case, the "root" parameter here is NULL. However, we obviously know that all cgroups must be descendants of the root cgroup, so the root cgroup is a guaranteed safe value for the shared subtree between the test cgroup and any potential reclaim target.

In practice, this means that the effective min/low protection will be capped by the protection value specified in all ancestors, which is the most conservative estimate.

Regards,
Natalie

[1] https://docs.kernel.org/admin-guide/cgroup-v2.html#reclaim-protection


+
+       /*
+        * In mem_cgroup_below_min(), the memcg pendant, this call is missing.
+        * mem_cgroup_below_min() gets called during traversal of the cgroup 
tree, where
+        * protection is already calculated as part of the traversal. dmem 
cgroup eviction
+        * does not traverse the cgroup tree, so we need to recalculate 
effective protection
+        * here.
+        */
+       dmem_cgroup_calculate_protection(root, test);
+       return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.emin);
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_below_min);
+
+/**
+ * dmem_cgroup_below_low() - Tests whether current usage is within low limit.
+ *
+ * @root: Root of the subtree to calculate protection for, or NULL to 
calculate global protection.
+ * @test: The pool to test the usage/low limit of.
+ *
+ * Return: true if usage is below low and the cgroup is protected, false 
otherwise.
+ */
+bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
+                          struct dmem_cgroup_pool_state *test)
+{
+       if (root == test || !pool_parent(test))
+               return false;
+
+       if (!root) {
+               for (root = test; pool_parent(root); root = pool_parent(root))
+                       {}
+       }
+
+       /*
+        * In mem_cgroup_below_low(), the memcg pendant, this call is missing.
+        * mem_cgroup_below_low() gets called during traversal of the cgroup 
tree, where
+        * protection is already calculated as part of the traversal. dmem 
cgroup eviction
+        * does not traverse the cgroup tree, so we need to recalculate 
effective protection
+        * here.
+        */
+       dmem_cgroup_calculate_protection(root, test);
+       return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.elow);
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
+
  static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
  {
        struct dmem_cgroup_region *region;



Reply via email to