On 5/16/25 16:53, Johannes Weiner wrote:
> On Fri, May 16, 2025 at 08:53:07AM +0200, Christian König wrote:
>> On 5/15/25 18:08, Johannes Weiner wrote:
>>>> Stop for a second.
>>>>
>>>> As far as I can see the shrinker for the TTM pool should *not* be
>>>> memcg aware. Background is that pages who enter the pool are
>>>> considered freed by the application.
>>>
>>> They're not free from a system POV until they're back in the page
>>> allocator.
>>>
>>>> The only reason we have the pool is to speed up allocation of
>>>> uncached and write combined pages as well as work around for
>>>> performance problems of the coherent DMA API.
>>>>
>>>> The shrinker makes sure that the pages can be given back to the core
>>>> memory management at any given time.
>>>
>>> That's work. And it's a direct result of some cgroup having allocated
>>> this memory. Why should somebody else have to clean it up?
>>
>> Because the cgroup who has allocated the memory is long gone. As
>> soon as the pages enter the pool they must be considered freed by
>> this cgroup.
> 
> Nope, not at all.

Why not? The process belonging to this cgroup has freed up it's memory so that 
other processes potentially belonging to other cgroups can allocate it again.

>> The cgroup who originally allocated it has no reference to the
>> memory any more and also no way of giving it back to the core
>> system.
> 
> Of course it does, the shrinker LRU.

No it doesn't. The LRU handling here is global and not per cgroup.

When some cgroup asks for memory we also give it pages which were previously 
allocated to other cgroups.

Because who allocated it doesn't matter in this moment, what matters are the 
properties of the memory.

> Listen, none of this is even remotely new. This isn't the first cache
> we're tracking, and it's not the first consumer that can outlive the
> controlling cgroup.

Yes, I knew about all of that and I find that extremely questionable on 
existing handling as well.

Memory pools which are only used to improve allocation performance are 
something the kernel handles transparently and are completely outside of any 
cgroup tracking whatsoever.

>>> The shrinker also doesn't run in isolation. It's invoked in the
>>> broader context of there being a memory shortage, along with all the
>>> other shrinkers in the system, along with file reclaim, and
>>> potentially even swapping.
>>>
>>> Why should all of this be externalized to other containers?
>>
>> That's the whole purpose of the pool.
>>
>> The pool only exists because the core memory management can't track
>> the difference between unchached, write combined and cached
>> memory. It's similar to moveable or DMA/DMA32.
> 
> And because of that, in the real world, you are operating a
> shrinker-managed cache for those setup costs, yes? And I explained to
> you the implications and consequences of that.

Yes and I explained that those implications and consequences are intentional.

>>> For proper memory isolation, the cleanup cost needs to be carried by
>>> the cgroup that is responsible for it in the first place - not some
>>> other container that's just trying to read() a file or malloc().
>>
>> That makes no sense at all.
> 
> How about we stay in our respective lanes of expertise and find a more
> productive way to get alignment on this, shall we?

I completely respect your opinion, but I have to point out that I have strong 
evidence disproving your that this approach won't work at all for this use case.

Regards,
Christian.

Reply via email to