dmem: Add reclaim callback for lowering max below current usage

Thomas Hellström Thu, 11 Jun 2026 10:28:30 -0700


On 6/11/26 16:40, [email protected] wrote:

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:

New issues:
- [Medium] The docstring for `struct dmem_cgroup_ops->reclaim` explicitly 
contradicts the implementation for `-ENOSPC` return values.


Fixed in v6.


Pre-existing issues:
- [High] On 32-bit systems, `dmemcg` limits tracked in bytes are truncated to 
32-bit `unsigned long`, leading to overflows and artificially severe 
constraints.


Pre-existing issues are out of scope.

/Thomas

--

commit ccfc580d266b8b80afec154f819853307351d98a
Author: Thomas Hellström <[email protected]>

cgroup/dmem: Add reclaim callback for lowering max below current usage

Add an optional reclaim callback to struct dmem_cgroup_region. When
dmem.max is set below the current usage of a cgroup pool, the new limit
is applied immediately and then the driver is asked to evict memory.

diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
index d9eab8a2c1ee6..d705e94d8784d 100644
--- a/include/linux/cgroup_dmem.h
+++ b/include/linux/cgroup_dmem.h
@@ -14,12 +14,33 @@ struct dmem_cgroup_pool_state;
  /* Opaque definition of a cgroup region, used internally */
  struct dmem_cgroup_region;

+/**

+ * struct dmem_cgroup_ops - Operations for a dmem cgroup region.
+ * @reclaim: Optional callback invoked when dmem.max is set below the current
+ *           usage of a pool. The driver should attempt to free at least
+ *           @target_bytes from @pool. May be called multiple times if usage
+ *           remains above the limit after returning.
+ *
+ *           Return: 0 if some progress was made (even if less than
+ *           @target_bytes was freed), -ENOSPC if no progress could be made,
+ *           or another negative error code if a fatal error occurred.
+ *           Any non-zero return stops further reclaim attempts.

[Severity: Medium]
Is the documentation accurate regarding the -ENOSPC return code?

The docstring here explicitly states that any non-zero return stops further
reclaim attempts. However, the implementation in set_resource_max() appears
to intercept -ENOSPC and will retry up to 16 times:

     if (ret && (ret != -ENOSPC || !retries--))
         break;

Could this mislead driver authors into assuming -ENOSPC is a terminal
return code that aborts the loop immediately, when it actually subjects
the driver to continuous retries?

+ */
+struct dmem_cgroup_ops {
+       int (*reclaim)(struct dmem_cgroup_pool_state *pool,
+                      u64 target_bytes, void *priv);
+};

[ ... ]

diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 373ad04d289d5..f7166b8255c7b 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c

[ ... ]

  static void
-set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
+set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblock)
  {
-       page_counter_set_max(&pool->cnt, val);
+       struct dmem_cgroup_region *region = pool->region;
+       unsigned long limit = (unsigned long)val;

[Severity: High]
This is a pre-existing issue, but does this truncation to unsigned long cause
severe limitations on 32-bit systems?

The dmemcg subsystem accounts for memory limits in bytes, but integrates with
the page_counter infrastructure which tracks limits using unsigned long. The
64-bit byte limit is truncated here, wrapping values larger than 4GB on 32-bit
architectures.

Additionally, when setting the limit to 'max' via dmemcg_parse_limit() or
reset_all_resource_limits(), it translates to PAGE_COUNTER_MAX (which is
~524,287 on 32-bit). Because dmem_cgroup_try_charge() passes the allocation
size in bytes to page_counter_try_charge(), a limit of 'max' restricts usage
to ~512KB instead of being unlimited. Will this cause 32-bit workloads to OOM
on allocations larger than ~512KB?

+
+       /*
+        * Always update the limit, even if usage currently exceeds it.
+        * Concurrent allocations will be throttled against the new limit
+        * while reclaim is in progress.
+        */
+       xchg(&pool->cnt.max, limit);

Re: [PATCH v5 3/6] cgroup/dmem: Add reclaim callback for lowering max below current usage

Reply via email to