On 11.09.25 14:49, Michel Dänzer wrote: >>>> What we are seeing here is on a low memory (4GiB) single node system with >>>> an APU, that it will have lots of latencies trying to allocate memory by >>>> doing direct reclaim trying to allocate order-10 pages, which will fail and >>>> down it goes until it gets to order-4 or order-3. With this change, we >>>> don't see those latencies anymore and memory pressure goes down as well. >>> That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: >>> Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where >>> taking a filesystem backup could cause Firefox to freeze for on the order >>> of a minute. >>> >>> Something like that can't just be ignored as "not a problem" for a >>> potential 30% performance gain. >> >> Well using 2MiB is actually a must have for certain HW features and we have >> quite a lot of people pushing to always using them. > > Latency can't just be ignored though. Interactive apps intermittently > freezing because this code desperately tries to reclaim huge pages while the > system is under memory pressure isn't acceptable.
Why should that not be acceptable? The purpose of the fallback is to allow displaying messages like "Your system is low on memory, please close some application!" instead of triggering the OOM killer directly. In that situation latency is not really a priority any more, but rather functionality. > Maybe there could be some kind of mechanism which periodically scans BOs for > sub-optimal page orders and tries migrating their storage to more optimal > pages. Well the problem usually happens because automatic page de-fragmentation is turned off, we had quite a number of bug reports for that. So you are basically suggesting to implement something on the BO level which the system administrator has previously turned off on the page level. On the other hand in this particular case it could be that the system just doesn't has not enough memory for the particular use case. >> So that TTM still falls back to lower order allocations is just a compromise >> to not trigger the OOM killer. >> >> What we could do is to remove the fallback, but then Cascardos use case >> wouldn't be working any more at all. > > Surely the issue is direct reclaim, not the fallback. I would rather say the issue is that fallback makes people think that direct reclaim isn't mandatory. Regards, Christian.