Re: [PATCH 07/16] memcg: add support for GPU page counters. (v4)

Christian König Mon, 23 Feb 2026 23:50:53 -0800

On 2/24/26 03:06, Dave Airlie wrote:
> From: Dave Airlie <[email protected]>
> 
> This introduces 2 new statistics and 3 new memcontrol APIs for dealing
> with GPU system memory allocations.
> 
> The stats corresponds to the same stats in the global vmstat,
> for number of active GPU pages, and number of pages in pools that
> can be reclaimed.
> 
> The first API charges a order of pages to a objcg, and sets
> the objcg on the pages like kmem does, and updates the active/reclaim
> statistic.
> 
> The second API uncharges a page from the obj cgroup it is currently charged
> to.
> 
> The third API allows moving a page to/from reclaim and between obj cgroups.
> When pages are added to the pool lru, this just updates accounting.
> When pages are being removed from a pool lru, they can be taken from
> the parent objcg so this allows them to be uncharged from there and 
> transferred
> to a new child objcg.
> 
> Acked-by: Christian König <[email protected]>


I have to take that back.

After going over the different use cases I'm now pretty convinced that charging 
any GPU/TTM allocation to memcg is the wrong approach to the problem.

Instead TTM should have a dmem_cgroup_pool which can limit the amount of system 
memory each cgroup can use from GTT.

The use case that GTT memory should account to memcg is actually only valid for 
an extremely small number of HPC customers and for those use cases we have 
different approaches to solve this issue (udmabuf, system DMA-buf heap, etc...).

What we can do is to say that this dmem_cgroup_pool then also accounts to memcg 
for selected cgroups. This would not only make it superflous to have different 
flags in drivers and TTM to turn this feature on/off, but also allow charging 
VRAM or other local memory to memcg because they use system memory as fallback 
for device memory.

In other more high level words memcg is actually the swapping space for dmem.

Regards,
Christian.

> Signed-off-by: Dave Airlie <[email protected]>
> ---
> v2: use memcg_node_stat_items
> v3: fix null ptr dereference in uncharge
> v4: AI review: fix parameter names, fix problem with reclaim moving doing 
> wrong thing
> ---
>  Documentation/admin-guide/cgroup-v2.rst |   6 ++
>  include/linux/memcontrol.h              |  11 +++
>  mm/memcontrol.c                         | 104 ++++++++++++++++++++++++
>  3 files changed, 121 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
> b/Documentation/admin-guide/cgroup-v2.rst
> index 91beaa6798ce..3ea7f1a399e8 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1573,6 +1573,12 @@ The following nested keys are defined.
>         vmalloc (npn)
>               Amount of memory used for vmap backed memory.
>  
> +       gpu_active (npn)
> +             Amount of system memory used for GPU devices.
> +
> +       gpu_reclaim (npn)
> +             Amount of system memory cached for GPU devices.
> +
>         shmem
>               Amount of cached filesystem data that is swap-backed,
>               such as tmpfs, shm segments, shared anonymous mmap()s
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 70b685a85bf4..4f75d64f5fca 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1583,6 +1583,17 @@ static inline void mem_cgroup_flush_foreign(struct 
> bdi_writeback *wb)
>  #endif       /* CONFIG_CGROUP_WRITEBACK */
>  
>  struct sock;
> +bool mem_cgroup_charge_gpu_page(struct obj_cgroup *objcg, struct page *page,
> +                        unsigned int order,
> +                        gfp_t gfp_mask, bool reclaim);
> +void mem_cgroup_uncharge_gpu_page(struct page *page,
> +                               unsigned int order,
> +                               bool reclaim);
> +bool mem_cgroup_move_gpu_page_reclaim(struct obj_cgroup *objcg,
> +                                   struct page *page,
> +                                   unsigned int order,
> +                                   bool to_reclaim);
> +
>  #ifdef CONFIG_MEMCG
>  extern struct static_key_false memcg_sockets_enabled_key;
>  #define mem_cgroup_sockets_enabled 
> static_branch_unlikely(&memcg_sockets_enabled_key)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a52da3a5e4fd..90bb3e00c258 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -333,6 +333,8 @@ static const unsigned int memcg_node_stat_items[] = {
>  #ifdef CONFIG_HUGETLB_PAGE
>       NR_HUGETLB,
>  #endif
> +     NR_GPU_ACTIVE,
> +     NR_GPU_RECLAIM,
>  };
>  
>  static const unsigned int memcg_stat_items[] = {
> @@ -1360,6 +1362,8 @@ static const struct memory_stat memory_stats[] = {
>       { "percpu",                     MEMCG_PERCPU_B                  },
>       { "sock",                       MEMCG_SOCK                      },
>       { "vmalloc",                    MEMCG_VMALLOC                   },
> +     { "gpu_active",                 NR_GPU_ACTIVE                   },
> +     { "gpu_reclaim",                NR_GPU_RECLAIM                  },
>       { "shmem",                      NR_SHMEM                        },
>  #ifdef CONFIG_ZSWAP
>       { "zswap",                      MEMCG_ZSWAP_B                   },
> @@ -5133,6 +5137,106 @@ void mem_cgroup_flush_workqueue(void)
>       flush_workqueue(memcg_wq);
>  }
>  
> +/**
> + * mem_cgroup_charge_gpu_page - charge a page to GPU memory tracking
> + * @objcg: objcg to charge, NULL charges root memcg
> + * @page: page to charge
> + * @order: page allocation order
> + * @gfp_mask: gfp mode
> + * @reclaim: charge the reclaim counter instead of the active one.
> + *
> + * Charge the order sized @page to the objcg. Returns %true if the charge 
> fit within
> + * @objcg's configured limit, %false if it doesn't.
> + */
> +bool mem_cgroup_charge_gpu_page(struct obj_cgroup *objcg, struct page *page,
> +                             unsigned int order, gfp_t gfp_mask, bool 
> reclaim)
> +{
> +     unsigned int nr_pages = 1 << order;
> +     struct mem_cgroup *memcg = NULL;
> +     struct lruvec *lruvec;
> +     int ret;
> +
> +     if (objcg) {
> +             memcg = get_mem_cgroup_from_objcg(objcg);
> +
> +             ret = try_charge_memcg(memcg, gfp_mask, nr_pages);
> +             if (ret) {
> +                     mem_cgroup_put(memcg);
> +                     return false;
> +             }
> +
> +             obj_cgroup_get(objcg);
> +             page_set_objcg(page, objcg);
> +     }
> +
> +     lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
> +     mod_lruvec_state(lruvec, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE, 
> nr_pages);
> +
> +     mem_cgroup_put(memcg);
> +     return true;
> +}
> +EXPORT_SYMBOL_GPL(mem_cgroup_charge_gpu_page);
> +
> +/**
> + * mem_cgroup_uncharge_gpu_page - uncharge a page from GPU memory tracking
> + * @page: page to uncharge
> + * @order: order of the page allocation
> + * @reclaim: uncharge the reclaim counter instead of the active.
> + */
> +void mem_cgroup_uncharge_gpu_page(struct page *page,
> +                               unsigned int order, bool reclaim)
> +{
> +     struct obj_cgroup *objcg = page_objcg(page);
> +     struct mem_cgroup *memcg;
> +     struct lruvec *lruvec;
> +     int nr_pages = 1 << order;
> +
> +     memcg = objcg ? get_mem_cgroup_from_objcg(objcg) : NULL;
> +
> +     lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
> +     mod_lruvec_state(lruvec, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE, 
> -nr_pages);
> +
> +     if (memcg && !mem_cgroup_is_root(memcg))
> +             refill_stock(memcg, nr_pages);
> +     page->memcg_data = 0;
> +     obj_cgroup_put(objcg);
> +     mem_cgroup_put(memcg);
> +}
> +EXPORT_SYMBOL_GPL(mem_cgroup_uncharge_gpu_page);
> +
> +/**
> + * mem_cgroup_move_gpu_reclaim - move pages from gpu to gpu reclaim and back
> + * @new_objcg: objcg to move page to, NULL if just stats update.
> + * @nr_pages: number of pages to move
> + * @to_reclaim: true moves pages into reclaim, false moves them back
> + */
> +bool mem_cgroup_move_gpu_page_reclaim(struct obj_cgroup *new_objcg,
> +                                   struct page *page,
> +                                   unsigned int order,
> +                                   bool to_reclaim)
> +{
> +     struct obj_cgroup *objcg = page_objcg(page);
> +
> +     if (!objcg || !new_objcg || objcg == new_objcg) {
> +             struct mem_cgroup *memcg = objcg ? 
> get_mem_cgroup_from_objcg(objcg) : NULL;
> +             struct lruvec *lruvec;
> +             unsigned long flags;
> +             int nr_pages = 1 << order;
> +
> +             lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
> +             local_irq_save(flags);
> +             mod_lruvec_state(lruvec, to_reclaim ? NR_GPU_RECLAIM : 
> NR_GPU_ACTIVE, nr_pages);
> +             mod_lruvec_state(lruvec, to_reclaim ? NR_GPU_ACTIVE : 
> NR_GPU_RECLAIM, -nr_pages);
> +             local_irq_restore(flags);
> +             mem_cgroup_put(memcg);
> +             return true;
> +     } else {
> +             mem_cgroup_uncharge_gpu_page(page, order, true);
> +             return mem_cgroup_charge_gpu_page(new_objcg, page, order, 0, 
> false);
> +     }
> +}
> +EXPORT_SYMBOL_GPL(mem_cgroup_move_gpu_page_reclaim);
> +
>  static int __init cgroup_memory(char *s)
>  {
>       char *token;

Re: [PATCH 07/16] memcg: add support for GPU page counters. (v4)

Reply via email to