On Thu, 8 May 2025 at 08:03, Dave Airlie <airl...@gmail.com> wrote: > > On Thu, 8 May 2025 at 03:52, Johannes Weiner <han...@cmpxchg.org> wrote: > > > > Hello Dave, > > > > On Fri, May 02, 2025 at 01:35:59PM +1000, Dave Airlie wrote: > > > Hey all, > > > > > > This is my second attempt at adding the initial simple memcg/ttm > > > integration. > > > > > > This varies from the first attempt in two major ways: > > > > > > 1. Instead of using __GFP_ACCOUNT and direct calling kmem charges > > > for pool memory, and directly hitting the GPU statistic, Waiman > > > suggested I just do what the network socket stuff did, which looks > > > simpler. So this adds two new memcg apis that wrap accounting. > > > The pages no longer get assigned the memcg, it's owned by the > > > larger BO object which makes more sense. > > > > Unfortunately, this was bad advice :( > > > > Naked-charging like this is quite awkward from the memcg side. It > > requires consumer-specific charge paths in the memcg code, adds stat > > counters that are memcg-only with no system-wide equivalent, and it's > > difficult for the memcg maintainers to keep track of the link between > > what's in the counter and the actual physical memory that is supposed > > to be tracked. > > > > The network and a few others like it are rather begrudging exceptions > > because they do not have a suitable page context or otherwise didn't > > fit the charging scheme. They're not good examples to follow if it can > > at all be avoided. > > I unfortunately feel GPU might fit in this category as well, we aren't > allocating pages in the traditional manner, so we have a number of > problems with doing it at the page level. > > I think the biggest barrier to doing page level tracking is with the > page pools we have to keep. When we need CPU uncached pages, we > allocate a bunch of pages and do the work to fix their cpu mappings to > be uncached, this work is costly, so when we no longer need the > uncached pages in an object, we return them to a pool rather than to > the central allocator. We then use a shrinker to empty the pool and > undo the cpu mapping change. We also do equivalent pools for dma able > and 32-bit dma able pages if they are used. > > This means that if userspace allocates a large uncached object, and we > account it to the current memcg with __GFP_ACCOUNT, then when it gets > handed back to the pool we have to remove it from the memcg > accounting. This was where I used the memcg kmem charge/uncharges, > uncharge on adding to pool and charge on reuse. However kmem seems > like the wrong place to charge, but it's where the initial > __GFP_ACCOUNT will put those pages. This is why the suggestions to use > the network solution worked better for me, I can do all the work > outside the pool code at a slightly higher level (the same level where > we charge/uncharge dmem), and I don't have to worry about handling the > different pool types etc. We also don't need page->memcg to be set for > these pages I don't think as all pages are part of a large object and > the object is what gets swapped or freed, not parts of it.
With all that said we also manage moving objects to swap and maybe for proper accounting of swapping I need the more detailed integrated approach at the lower levels, so objects that have been swapped get removed from the gpu counter and added to the normal swap counters. Dave. > > > > > __GFP_ACCOUNT and an enum node_stat_item is the much preferred way. I > > have no objections to exports if you need to charge and account memory > > from a module. > > Now maybe it makes sense to add a node_stat_item for this as well as > the GPU memcg accounting so we can have it accounted in both places, > right now mm has no insight statistics wise into this memory at all, > other than it being pages allocated. > > Dave.