Hey, On 6/5/26 17:42, Eric Chanudet wrote: > On Fri, Jun 05, 2026 at 01:27:09PM +0200, Maarten Lankhorst wrote: >> Hey, >> >> On 5/26/26 18:59, Eric Chanudet wrote: >>> On Fri, May 22, 2026 at 05:26:16PM +0200, Michal Koutný wrote: >>>> Hello Eric. >>>> >>>> On Tue, May 19, 2026 at 11:59:02AM -0400, Eric Chanudet >>>> <[email protected]> wrote: >>>>> Add a root-only cgroupfs file "dmem.memcg" that lets an administrator >>>>> configure whether allocations in a dmem region should also be charged to >>>>> the memory controller. >>>> >>>> This kinda makes sense as it is not unlike io.cost.* device >>>> configurators. >>>> >>>> Just for my better understanding -- will there be a space for userspace >>>> to switch this? (No charged dmem allocations happen before responsible >>>> userspace runs, so that the attribute remains unlocked.) >>>> >>>> (I'm rather indifferent about the actual double charging/non-charging >>>> matter.) >>> >>> Yes, this is intended to be configured before the user space stack that >>> would start allocating things is started. Once it has started (and tried >>> to charge something), the configuration is locked >>> >>>> >>>>> >>>>> To handle inheritance, dmem adds a depends_on the memory controller, >>>>> unless MEMCG isn't configured in. >>>>> >>>>> Double-charging is disabled by default. Once a charge is attempted, the >>>>> setting is locked to prevent inconsistent accounting by a small 4-state >>>>> machine (off, on, locked off, locked on). >>>>> >>>>> The memcg to charge is derived from the pool's cgroup, since the pool >>>>> holds a reference to the dmem cgroup state that keeps the cgroup alive >>>>> until it gets uncharged. >>>>> >>>>> Signed-off-by: Eric Chanudet <[email protected]> >>>>> --- >>>>> Documentation/admin-guide/cgroup-v2.rst | 23 +++++ >>>>> kernel/cgroup/dmem.c | 158 >>>>> +++++++++++++++++++++++++++++++- >>>>> 2 files changed, 178 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst >>>>> b/Documentation/admin-guide/cgroup-v2.rst >>>>> index >>>>> 6efd0095ed995b1550317662bc1b56c7a7f3db23..1d2fa55ddf0faa17baa916a8914d3033e8e42359 >>>>> 100644 >>>>> --- a/Documentation/admin-guide/cgroup-v2.rst >>>>> +++ b/Documentation/admin-guide/cgroup-v2.rst >>>>> @@ -2828,6 +2828,29 @@ DMEM Interface Files >>>>> drm/0000:03:00.0/vram0 12550144 >>>>> drm/0000:03:00.0/stolen 8650752 >>>>> >>>>> + dmem.memcg >>>>> + A readwrite nested-keyed file that exists only on the root >>>>> + cgroup. >>>> >>>> Strictly speaking this is not nested-keyed but flat keyed [1], >>> >>> Indeed, >>> >>>> which leads me to realization that this is the first instance of a boolean. >>>> All in call, such a composition comes to my mind (latter is RO): >>>> >>>> drm/0000:03:00.0/vram0 enable=0|1 locked=0|1 >>>> >>> >>> So per[1] 1 key, 2 sub-keys (enable RW, locked RO), that looks better >>> and match the documentation, thanks! >>> >>>> >>>> >>>>> +static ssize_t dmem_cgroup_memcg_write(struct kernfs_open_file *of, char >>>>> *buf, >>>>> + size_t nbytes, loff_t off) >>>>> +{ >>>>> + while (buf) { >>>>> + struct dmem_cgroup_region *region; >>>>> + char *options, *name; >>>>> + bool flag; >>>>> + >>>>> + options = buf; >>>>> + buf = strchr(buf, '\n'); >>>>> + if (buf) >>>>> + *buf++ = '\0'; >>>> >>>> I recall there was a discussion about accepting only a single device per >>>> write(2) (at the same time I see this idiom is still present in other >>>> dmem.* files, so this is nothing to change in _this_ patch). >>> >>> I would second that. When setting say dmem.max for 2 regions, with a >>> typo on the second, the first one is set, but write still get EINVAL. >>> >>> Also, I just notice dmemcg_limit_write() returns EINVAL if the region is >>> not found (this patch returns ENODEV). >>> >>>> >>>> Thanks, >>>> Michal >>>> >>>> [1] >>>> https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#format >>> >>> >>> >> >> Perhaps a bit late, but before we start adding this UAPI we should enforce a >> single region per write? > > I can send that separately, although that is a UAPI change. Is there any > user that would be affected? > > This series is hung on charging memcg using memory objects from the > context of dmem, when at that level of abstraction it doesn't have > access to the underlying pieces that were allocated.
It's a uapi change, but I see more and more interest in the development and usage of dmemcg. I believe it's better to fix this before users (perhaps accidentally) start to rely on this behavior. Kind regards, ~Maarten Lankhorst
