On 5/15/25 18:08, Johannes Weiner wrote: >> Stop for a second. >> >> As far as I can see the shrinker for the TTM pool should *not* be >> memcg aware. Background is that pages who enter the pool are >> considered freed by the application. > > They're not free from a system POV until they're back in the page > allocator. > >> The only reason we have the pool is to speed up allocation of >> uncached and write combined pages as well as work around for >> performance problems of the coherent DMA API. >> >> The shrinker makes sure that the pages can be given back to the core >> memory management at any given time. > > That's work. And it's a direct result of some cgroup having allocated > this memory. Why should somebody else have to clean it up?
Because the cgroup who has allocated the memory is long gone. As soon as the pages enter the pool they must be considered freed by this cgroup. The cgroup who originally allocated it has no reference to the memory any more and also no way of giving it back to the core system. Keeping the memory accounted to the cgroup who allocated it would break the whole system. See the pool only exists because of missing features in the core memory management. > The shrinker also doesn't run in isolation. It's invoked in the > broader context of there being a memory shortage, along with all the > other shrinkers in the system, along with file reclaim, and > potentially even swapping. > > Why should all of this be externalized to other containers? That's the whole purpose of the pool. The pool only exists because the core memory management can't track the difference between unchached, write combined and cached memory. It's similar to moveable or DMA/DMA32. > For proper memory isolation, the cleanup cost needs to be carried by > the cgroup that is responsible for it in the first place - not some > other container that's just trying to read() a file or malloc(). That makes no sense at all. > This memory isn't special. The majority of memcg-tracked memory is > shrinkable/reclaimable. In every single case it stays charged until > the shrink work has been completed, and the pages are handed back to > the allocator. To be honest I think that memcg is the special case and what TTM or the network subsystem does for per device memory allocation is the norm. Keeping memory accounted to the cgroup who originally allocated it after this cgroup has freed it back to a pool makes no sense at all because the pool is exactly there to improve the performance independent of the cgroup who is allocating. Regards, Christian.