On 11.09.25 10:26, Michel Dänzer wrote: > On 10.09.25 14:52, Thadeu Lima de Souza Cascardo wrote: >> On Wed, Sep 10, 2025 at 02:11:58PM +0200, Christian König wrote: >>> On 10.09.25 13:59, Thadeu Lima de Souza Cascardo wrote: >>>> When the TTM pool tries to allocate new pages, it stats with max order. If >>>> there are no pages ready in the system, the page allocator will start >>>> reclaim. If direct reclaim fails, the allocator will reduce the order until >>>> it gets all the pages it wants with whatever order the allocator succeeds >>>> to reclaim. >>>> >>>> However, while the allocator is reclaiming, lower order pages might be >>>> available, which would work just fine for the pool allocator. Doing direct >>>> reclaim just introduces latency in allocating memory. >>>> >>>> The system should still start reclaiming in the background with kswapd, but >>>> the pool allocator should try to allocate a lower order page instead of >>>> directly reclaiming. >>>> >>>> If not even a order-1 page is available, the TTM pool allocator will >>>> eventually get to start allocating order-0 pages, at which point it should >>>> and will directly reclaim. >>> >>> Yeah that was discussed before quite a bit but at least for AMD GPUs that >>> is absolutely not something we should do. >>> >>> The performance difference between using high and low order pages can be up >>> to 30%. So the added extra latency is just vital for good performance. >>> >>> We could of course make that depend on the HW you use if it isn't necessary >>> for some other GPU, but at least both NVidia and Intel seem to have pretty >>> much the same HW restrictions. >>> >>> NVidia has been working on extending this to even use 1GiB pages to reduce >>> the TLB overhead even further. >> >> But if the system cannot reclaim or is working hard on reclaiming, it will >> not allocate that page and the pool allocator will resort to lower order >> pages anyway. >> >> In case the system has pages available, it will use them. I think there is >> a balance here and I find this one is reasonable. If the system is not >> under pressure, it will allocate those higher order pages, as expected. >> >> I can look into the behavior when the system might be fragmented, but I >> still believe that the pool is offering such a protection by keeping those >> higher order pages around. It is when the system is under memory presure >> that we need to resort to lower order pages. >> >> What we are seeing here is on a low memory (4GiB) single node system with >> an APU, that it will have lots of latencies trying to allocate memory by >> doing direct reclaim trying to allocate order-10 pages, which will fail and >> down it goes until it gets to order-4 or order-3. With this change, we >> don't see those latencies anymore and memory pressure goes down as well. > That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: > Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking > a filesystem backup could cause Firefox to freeze for on the order of a > minute. > > Something like that can't just be ignored as "not a problem" for a potential > 30% performance gain.
Well using 2MiB is actually a must have for certain HW features and we have quite a lot of people pushing to always using them. So that TTM still falls back to lower order allocations is just a compromise to not trigger the OOM killer. What we could do is to remove the fallback, but then Cascardos use case wouldn't be working any more at all. Regards, Christian.