Hello, On Sat, May 17, 2025 at 06:25:02AM +1000, Dave Airlie wrote: > I think this is where we have 2 options: > (a) moving this stuff into core mm and out of shrinker context > (b) fix our shrinker to be cgroup aware and solve that first. > > The main question I have for Christian, is can you give me a list of > use cases that this will seriously negatively effect if we proceed > with (b).
This thread seems to have gone a bit haywire and we may be losing some context. I'm not sure not doing (b) is an option for acceptable isolation. I think Johannes already raised the issue but please consider the following scenario: - There's a GPU workload which uses a sizable amount of system memory for the pool being discussed in this thread. This GPU workload is very important, so we want to make sure that other activities in the system don't bother it. We give it plenty of isolated CPUs and protect its memory with high enough memory.low. - Because most CPUs are largely idling while GPU is busy, there are plenty of CPU cycles which can be used without impacting the GPU workload, so we decide to do some data preprocessing which involves scanning large data set creating memory usage which is mostly streaming but still has enough look backs to promote them in the LRU lists. IIUC, in the shared pool model, the GPU memory which isn't currently being used would sit outside the cgroup, and thus outside the protection of memory.low. Again, IIUC, you want to make this pool priority reclaimed because reclaiming is nearly free and you don't want to create undue pressure on other reclaimable resources. However, what would happen in the above scenario under such implementation is that the GPU workload would keep losing its memory pool to the background memory pressure created by the streaming memory usage. It's also easy to expand on scenarios like this with other GPU workloads with differing priorities and memory allotments and so on. There may be some basic misunderstanding here. If a resource is worth caching, that usually indicates that there's some significant cost associated with un-caching the resource. It doesn't matter whether that cost is on the creation or destruction path. Here, the alloc path is expensive and free path is nearly free. However, this doesn't mean that we can get free isolation while bunching them together for immediate reclaim as others would be able to force you into alloc operations that you wouldn't need otherwise. If someone else can make you pay for something that you otherwise wouldn't, that resource is not isolated. Thanks. -- tejun