Hi Matthew,

On 5/8/2026 8:49 PM, Matthew Auld wrote:
On 04/05/2026 12:10, Arunpravin Paneer Selvam wrote:
The current buddy allocator maintains separate clear_tree[] and
dirty_tree[] rbtrees per order, preventing coalescing between cleared
and dirty buddies. Under mixed workloads, this creates a merge barrier:
adjacent buddies frequently end up split across trees, forcing reliance
on __force_merge() during allocation.

__force_merge() performs an O(N x max_order) scan under the VRAM manager
lock, leading to allocation stalls and failures for large contiguous
requests even when sufficient total free memory is available.

Solution

Replace the dual-tree design with:
- A single free_tree[order] rbtree for dirty and mixed free blocks
   (fully cleared free blocks float outside this tree)
- A lightweight out-of-band clear tracker (gpu_clear_tracker)

Fully cleared free blocks are tracked outside the buddy trees using an
augmented interval rbtree, enabling O(log E) lookup of the largest
cleared extents.

Buddy coalescing is now unconditional in __gpu_buddy_free(), regardless
of clear/dirty state. This removes the merge barrier and eliminates the
need for __force_merge().

Benefits

- Correct high-order allocations after mixed clear/dirty workloads
- Elimination of O(N x max_order) merge cost from the allocation path
- O(log E) cleared-extent lookup replacing O(N) scans
- Predictable allocation latency under fragmentation
- Reduced complexity with a single tree per order

Test:
dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000

Below data is from /sys/kernel/debug/dri/1/amdgpu_vram_mm:

Base (dual-tree), before VKCTS test:
   order- 6 free:   6 MiB,  blocks: 26
   order- 5 free:   1 MiB,  blocks: 15
   order- 4 free: 960 KiB,  blocks: 15
   order- 3 free:   5 MiB,  blocks: 171
   order- 2 free:   2 MiB,  blocks: 176
   order- 1 free:   1 MiB,  blocks: 165
   order- 0 free:  16 KiB,  blocks: 4

Base (dual-tree), after VKCTS test:
   order- 6 free: 768 KiB,  blocks: 3
   order- 5 free: 499 MiB,  blocks: 3999
   order- 4 free: 250 MiB,  blocks: 4001
   order- 3 free: 129 MiB,  blocks: 4157
   order- 2 free:  65 MiB,  blocks: 4161
   order- 1 free:  63 MiB,  blocks: 8138
   order- 0 free:  20 KiB,  blocks: 5

Clear tracker, before VKCTS test:
   order- 6 free:   4 MiB,  blocks: 19
   order- 5 free:   2 MiB,  blocks: 18
   order- 4 free: 704 KiB,  blocks: 11
   order- 3 free:   5 MiB,  blocks: 168
   order- 2 free:   2 MiB,  blocks: 174
   order- 1 free:   1 MiB,  blocks: 167
   order- 0 free:  32 KiB,  blocks: 8

Clear tracker, after VKCTS test:
   order- 6 free:   4 MiB,  blocks: 19
   order- 5 free:   2 MiB,  blocks: 18
   order- 4 free: 704 KiB,  blocks: 11
   order- 3 free:   5 MiB,  blocks: 168
   order- 2 free:   2 MiB,  blocks: 174
   order- 1 free:   1 MiB,  blocks: 167
   order- 0 free:  28 KiB,  blocks: 7

v2:
  - Code-style cleanup and minor refactoring
  - Renamed locals for clarity

Cc: Matthew Auld <[email protected]>
Cc: Christian König <[email protected]>
Signed-off-by: Arunpravin Paneer Selvam <[email protected]>

Still need some more time to fully go over this, but in the meantime there is some feedback here from sashiko, which might be worth a look:

https://sashiko.dev/#/patchset/20260504111055.262964-1-Arunpravin.PaneerSelvam%40amd.com
I have sent the v3. Please go through it. I will check the Sashiko review comments.

Regards,
Arun.

Reply via email to