On Fri, May 16, 2025 at 05:35:12PM +0200, Christian König wrote: > On 5/16/25 16:53, Johannes Weiner wrote: > > On Fri, May 16, 2025 at 08:53:07AM +0200, Christian König wrote: > >> The cgroup who originally allocated it has no reference to the > >> memory any more and also no way of giving it back to the core > >> system. > > > > Of course it does, the shrinker LRU. > > No it doesn't. The LRU handling here is global and not per cgroup.
Well, the discussion at hand is that it should be. > > Listen, none of this is even remotely new. This isn't the first cache > > we're tracking, and it's not the first consumer that can outlive the > > controlling cgroup. > > Yes, I knew about all of that and I find that extremely questionable > on existing handling as well. This code handles billions of containers every day, but we'll be sure to consult you on the next redesign. > Memory pools which are only used to improve allocation performance > are something the kernel handles transparently and are completely > outside of any cgroup tracking whatsoever. You're describing a cache. It doesn't matter whether it's caching CPU work, IO work or network packets. What matters is what it takes to recycle those pages for other purposes - especially non-GPU purposes. And more importantly, *what other memory in other cgroups they displace in the meantime*. It's really not that difficult to see an isolation issue here. Anyway, it doesn't look like there is a lot of value in continuing this conversation, so I'm going to check out of this subthread.