On 5/19/26 01:39, T.J. Mercier wrote: > On Mon, May 18, 2026 at 7:07 AM Christian König > <[email protected]> wrote: >> >> On 5/18/26 14:50, Albert Esteve wrote: >>> On Mon, May 18, 2026 at 9:20 AM Christian König >>> <[email protected]> wrote: >>>> >>>> On 5/15/26 19:06, T.J. Mercier wrote: >>>>> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <[email protected]> >>>>> wrote: >>>>>> >>>>>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote: >>>>>>> On embedded platforms a central process often allocates dma-buf >>>>>>> memory on behalf of client applications. Without a way to >>>>>>> attribute the charge to the requesting client's cgroup, the >>>>>>> cost lands on the allocator, making per-cgroup memory limits >>>>>>> ineffective for the actual consumers. >>>>>>> >>>>>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to >>>>>> >>>>>> Please be aware that pidfds come in two flavors: >>>>>> >>>>>> thread-group pidfds and thread-specific pidfds. Make sure that your API >>>>>> doesn't implicitly depend on this distinction not existing. >>>>> >>>>> Hi Christian, >>>>> >>>>> Memcg is not a controller that supports "thread mode" so all threads >>>>> in a group should belong to the same memcg. >>>> >>>> BTW: Exactly that is the requirement automotive has with their native >>>> context use case. >>>> >>>> The use case is that you have a deamon which has multiple threads were >>>> each one is acting on behalve of some other process. >>>> >>>> At the moment we basically say they are simply not using cgroups for that >>>> use case, but it would be really nice if we could handle that as well. >>>> >>>> Summarizing the requirement of that use case: You need a different cgroup >>>> for each thread of a process. >>> >>> Hi Christian, >>> >>> Thanks for sharing this atuomotive usecase. If I understand correctly, >>> the actual requirement is attributing dma-buf charges to the right >>> client, not putting each daemon thread in a different cgroup? >> >> Nope, exactly that's the difference. >> >> The thread acts as a filtering agent for both memory allocation and command >> submission for somebody else, the process on which behalve the daemon does >> things can even be in a client VM, completely remote over some network or >> even something like a microcontroller. >> >> Everything the thread does regarding CPU time, GPU driver memory allocation >> as well as resources like GPU processing and I/O time etc.. needs to be >> accounted to one client which can be different for each thread of the >> process. >> >> The only thing which is shared with the main process thread is CPU memory >> resources, e.g. malloc() because that is basically just needed for >> housekeeping and pretty much irrelevant for this kind of use case. >> >> The problem is now you can't do that with cgroups at the moment but >> unfortunately only the kernel has the information you need to know to do >> this. >> >> So what you end up with is to define tons of interfaces just to get the >> necessary information from the kernel into userspace and then essentially >> duplicate the same infrastructure cgroup provides in the kernel in userspace >> again. >> >>> If so, >>> the `charge_pid_fd` approach achieves this directly by passing the >>> client's `pid_fd`, without needing to add per-thread cgroup >>> infrastructure. >> >> Well it's already a massive improvemt, we could basically stop doing the >> whole duplication part for the GPU driver stack and just use cgroups for >> this part. >> >> Doing that automatically for CPU and I/O time would just be nice to have >> additionally. >> >> Regards, >> Christian. > > Hopefully I'm following correctly here.... So you are duplicating the > GPU driver stack to achieve remote accounting on a per-thread basis?
Not quite, we are duplicating the handling cgroup provides in the kernel in userspace. For this memory usage information as well as execution times of the GPU kernel driver is exposed in fdinfo for example. > Does this mean for GPU allocations you currently have some GFP_ACCOUNT > magic in your driver to attribute GPU memory to the correct remote > client? No, we just expose what the kernel driver has allocated for itself. E.g. page tables, buffers etc... When userspace allocates something using memfd_create() for example we just ignore that. > So this series would close the gap for dma-buf allocations, > but what about private GPU driver memory allocated on behalf of a > client? Well we would need a cgroup which isn't associated with any process were we could charge the GPU driver allocations against. But good point, charging against a pid wouldn't work in this use case. Regards, Christian.

