On 11.09.25 16:48, Michel Dänzer wrote:
> On 11.09.25 16:31, Christian König wrote:
>> On 11.09.25 14:49, Michel Dänzer wrote:
>>>>>> What we are seeing here is on a low memory (4GiB) single node system with
>>>>>> an APU, that it will have lots of latencies trying to allocate memory by
>>>>>> doing direct reclaim trying to allocate order-10 pages, which will fail 
>>>>>> and
>>>>>> down it goes until it gets to order-4 or order-3. With this change, we
>>>>>> don't see those latencies anymore and memory pressure goes down as well.
>>>>> That reminds me of the scenario I described in the 00862edba135 
>>>>> ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit 
>>>>> log, where taking a filesystem backup could cause Firefox to freeze for 
>>>>> on the order of a minute.
>>>>>
>>>>> Something like that can't just be ignored as "not a problem" for a 
>>>>> potential 30% performance gain.
>>>>
>>>> Well using 2MiB is actually a must have for certain HW features and we 
>>>> have quite a lot of people pushing to always using them.
>>>
>>> Latency can't just be ignored though. Interactive apps intermittently 
>>> freezing because this code desperately tries to reclaim huge pages while 
>>> the system is under memory pressure isn't acceptable.
>>
>> Why should that not be acceptable?
> 
> Sounds like you didn't read / understand the scenario in the 00862edba135 
> commit log:
> 
> I was trying to use Firefox while restic was taking a filesystem backup, and 
> it froze for up to a minute. After disabling direct reclaim, Firefox was 
> perfectly usable without noticeable freezes in the same scenario.
> 
> Show me the user who finds it acceptable to wait for a minute for interactive 
> apps to respond, just in case some GPU operations might be 30% faster.

Ok granted, a minute is rather extreme. But IIRC the issue you described was 
solved by using __GFP_NORETRY, that here is about completely disabling direct 
reclaim.

As far as I know with __GFP_NORETRY set the direct reclaim path results in 
latency in the milliseconds range.

The key point is we tried to completely disable direct reclaim before and that 
made the datacenter customers scream out because now the performance was 
totally unstable.

E.g. something like compiling software first and then running a benchmark was 
like 30% slower than running the benchmark directly after boot.

Regards,
Christian.

Reply via email to