The current buddy allocator maintains separate clear_tree[] and
dirty_tree[] rbtrees per order, preventing coalescing between cleared
and dirty buddies. Under mixed workloads, this creates a merge barrier:
adjacent buddies frequently end up split across trees, forcing reliance
on __force_merge() during allocation.
__force_merge() performs an O(N x max_order) scan under the VRAM manager
lock, leading to allocation stalls and failures for large contiguous
requests even when sufficient total free memory is available.
Solution
Replace the dual-tree design with:
- A single free_tree[order] rbtree for dirty and mixed free blocks
(fully cleared free blocks float outside this tree)
- A lightweight out-of-band clear tracker (gpu_clear_tracker)
Fully cleared free blocks are tracked outside the buddy trees using an
augmented interval rbtree, enabling O(log E) lookup of the largest
cleared extents.
Buddy coalescing is now unconditional in __gpu_buddy_free(), regardless
of clear/dirty state. This removes the merge barrier and eliminates the
need for __force_merge().
Benefits
- Correct high-order allocations after mixed clear/dirty workloads
- Elimination of O(N x max_order) merge cost from the allocation path
- O(log E) cleared-extent lookup replacing O(N) scans
- Predictable allocation latency under fragmentation
- Reduced complexity with a single tree per order
Test:
dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000
Below data is from /sys/kernel/debug/dri/1/amdgpu_vram_mm:
Base (dual-tree), before VKCTS test:
order- 6 free: 6 MiB, blocks: 26
order- 5 free: 1 MiB, blocks: 15
order- 4 free: 960 KiB, blocks: 15
order- 3 free: 5 MiB, blocks: 171
order- 2 free: 2 MiB, blocks: 176
order- 1 free: 1 MiB, blocks: 165
order- 0 free: 16 KiB, blocks: 4
Base (dual-tree), after VKCTS test:
order- 6 free: 768 KiB, blocks: 3
order- 5 free: 499 MiB, blocks: 3999
order- 4 free: 250 MiB, blocks: 4001
order- 3 free: 129 MiB, blocks: 4157
order- 2 free: 65 MiB, blocks: 4161
order- 1 free: 63 MiB, blocks: 8138
order- 0 free: 20 KiB, blocks: 5
Clear tracker, before VKCTS test:
order- 6 free: 4 MiB, blocks: 19
order- 5 free: 2 MiB, blocks: 18
order- 4 free: 704 KiB, blocks: 11
order- 3 free: 5 MiB, blocks: 168
order- 2 free: 2 MiB, blocks: 174
order- 1 free: 1 MiB, blocks: 167
order- 0 free: 32 KiB, blocks: 8
Clear tracker, after VKCTS test:
order- 6 free: 4 MiB, blocks: 19
order- 5 free: 2 MiB, blocks: 18
order- 4 free: 704 KiB, blocks: 11
order- 3 free: 5 MiB, blocks: 168
order- 2 free: 2 MiB, blocks: 174
order- 1 free: 1 MiB, blocks: 167
order- 0 free: 28 KiB, blocks: 7
v2:
- Code-style cleanup and minor refactoring
- Renamed locals for clarity
Cc: Matthew Auld <[email protected]>
Cc: Christian König <[email protected]>
Signed-off-by: Arunpravin Paneer Selvam
<[email protected]>