pool: enable memcg tracking and shrinker. (v2)

Dave Airlie Mon, 08 Sep 2025 19:19:24 -0700

On Thu, 4 Sept 2025 at 21:30, Christian König <christian.koe...@amd.com> wrote:
>
> On 04.09.25 04:25, Dave Airlie wrote:
> > On Wed, 3 Sept 2025 at 00:23, Christian König <christian.koe...@amd.com> 
> > wrote:
> >>
> >> On 02.09.25 06:06, Dave Airlie wrote:
> >>> From: Dave Airlie <airl...@redhat.com>
> >>>
> >>> This enables all the backend code to use the list lru in memcg mode,
> >>> and set the shrinker to be memcg aware.
> >>>
> >>> It adds the loop case for when pooled pages end up being reparented
> >>> to a higher memcg group, that newer memcg can search for them there
> >>> and take them back.
> >>
> >> I can only repeat that as far as I can see that makes no sense at all.
> >>
> >> This just enables stealing pages from the page pool per cgroup and won't 
> >> give them back if another cgroup runs into a low memery situation.
> >>
> >> Maybe Thomas and the XE guys have an use case for that, but as far as I 
> >> can see that behavior is not something we would ever want.
> >
> > This is what I'd want for a desktop use case at least, if we have a
> > top level cgroup then logged in user cgroups, each user will own their
> > own uncached pages pool and not cause side effects to other users. If
> > they finish running their pool will get give to the parent.
> >
> > Any new pool will get pages from the parent, and manage them itself.
> >
> > This is also what cgroup developers have said makes the most sense for
> > containerisation here, one cgroup allocator should not be able to
> > cause shrink work for another cgroup unnecessarily.
>
> The key point is i915 is doing the exact same thing completely without a pool 
> and with *MUCH* less overhead.
>
> Together with Thomas I've implemented that approach for TTM as WIP patch and 
> on a Ryzen 7 page faulting becomes nearly ten times faster.
>
> The problem is that the PAT and other legacy handling is like two decades old 
> now and it seems like nobody can remember how it is actually supposed to work.
>
> See this patch here for example as well:
>
> commit 9542ada803198e6eba29d3289abb39ea82047b92
> Author: Suresh Siddha <suresh.b.sid...@intel.com>
> Date:   Wed Sep 24 08:53:33 2008 -0700
>
>     x86: track memtype for RAM in page struct
>
>     Track the memtype for RAM pages in page struct instead of using the
>     memtype list. This avoids the explosion in the number of entries in
>     memtype list (of the order of 20,000 with AGP) and makes the PAT
>     tracking simpler.
>
>     We are using PG_arch_1 bit in page->flags.
>
>     We still use the memtype list for non RAM pages.
>
>     Signed-off-by: Suresh Siddha <suresh.b.sid...@intel.com>
>     Signed-off-by: Venkatesh Pallipadi <venkatesh.pallip...@intel.com>
>     Signed-off-by: Ingo Molnar <mi...@elte.hu>
>
> So we absolutely *do* have a page flag to indicate the cached vs uncached 
> status, it's just that we can't allocate those pages in TTM for some reason. 
> I'm still digging up what part is missing here.
>
> What I want to avoid is that we created UAPI or at least specific behavior 
> people then start to rely upon. That would make it much more problematic to 
> remove the pool in the long term.


Okay, how about we land the first set of patches to move over to
list_lru at least,

The patches up ttm/pool: track allocated_pages per numa node. If I can
get r-b on those I think we should land those.

Then we try and figure out how to do this without pools, and just land
memcg with no uncached pools support. However we still have to handle
dma pages for certain scenarios and I think they may suffer from the
same problem, but just less one we care about.

Dave.

Re: [PATCH 11/15] ttm/pool: enable memcg tracking and shrinker. (v2)

Reply via email to