On Thu, Jun 19, 2025 at 10:33 AM Shakeel Butt <[email protected]> wrote: > > On Wed, Jun 18, 2025 at 02:06:17PM +1000, Dave Airlie wrote: > > From: Dave Airlie <[email protected]> > > > > While discussing memcg intergration with gpu memory allocations, > > it was pointed out that there was no numa/system counters for > > GPU memory allocations. > > > > With more integrated memory GPU server systems turning up, and > > more requirements for memory tracking it seems we should start > > closing the gap. > > > > Add two counters to track GPU per-node system memory allocations. > > > > The first is currently allocated to GPU objects, and the second > > is for memory that is stored in GPU page pools that can be reclaimed, > > by the shrinker. > > > > Cc: Christian Koenig <[email protected]> > > Cc: Matthew Brost <[email protected]> > > Cc: Johannes Weiner <[email protected]> > > Cc: [email protected] > > Cc: Andrew Morton <[email protected]> > > Signed-off-by: Dave Airlie <[email protected]> > > > > --- > > > > I'd like to get acks to merge this via the drm tree, if possible, > > > > Dave. > > --- > > Documentation/filesystems/proc.rst | 6 ++++++ > > drivers/base/node.c | 5 +++++ > > fs/proc/meminfo.c | 6 ++++++ > > include/linux/mmzone.h | 2 ++ > > mm/show_mem.c | 9 +++++++-- > > mm/vmstat.c | 2 ++ > > 6 files changed, 28 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/filesystems/proc.rst > > b/Documentation/filesystems/proc.rst > > index 5236cb52e357..45f61a19a790 100644 > > --- a/Documentation/filesystems/proc.rst > > +++ b/Documentation/filesystems/proc.rst > > @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields. > > CmaFree: 0 kB > > Unaccepted: 0 kB > > Balloon: 0 kB > > + GPUActive: 0 kB > > + GPUReclaim: 0 kB > > HugePages_Total: 0 > > HugePages_Free: 0 > > HugePages_Rsvd: 0 > > @@ -1273,6 +1275,10 @@ Unaccepted > > Memory that has not been accepted by the guest > > Balloon > > Memory returned to Host by VM Balloon Drivers > > +GPUActive > > + Memory allocated to GPU objects > > +GPUReclaim > > + Memory in GPU allocator pools that is reclaimable > > Can you please explain a bit more about these GPUActive & GPUReclaim? > Please correct me if I am wrong, GPUActive is the total memory used by > GPU objects and GPUReclaim is the subset of GPUActive which is > reclaimable (possibly through shrinkers).
Currently, GPUActive is total memory used by active GPU objects. GPUReclaim is the amount of memory (not a subset of Active) that is being stored in GPU reusable pools, that can be retrieved via a simple shrinker. (this memory usually has different page table attributes, uncached or writecombined). Example workflow: User allocates cached system RAM for GPU object: Active increases, Free cached system RAM, Active decreases. User allocates write combined system RAM for GPU object: Active increases Free write combined system RAM Active decreases, Reclaim increases User allocates another WC system RAM object: Reclaim decreases Active increases Shrinker shrinks the pool: Reclaim decreases. There could be in the future a 3rd type of memory which I'm not sure it if necessary to account at this level, but it would be Active memory that the driver considers discardable, and could be shrunk easily, but I'm not seeing much consistency on usage in drivers for this, or even what use case it is needed for, so I'm not going to address it yet. This could end up in Reclaim, but I'd need to see the use cases for it. Dave.
