On 17/02/2026 11:38, Arunpravin Paneer Selvam wrote:
Large alignment requests previously forced the buddy allocator to search by
alignment order, which often caused higher-order free blocks to be split even
when a suitably aligned smaller region already existed within them. This led
to excessive fragmentation, especially for workloads requesting small sizes
with large alignment constraints.

This change prioritizes the requested allocation size during the search and
uses an augmented RB-tree field (subtree_max_alignment) to efficiently locate
free blocks that satisfy both size and offset-alignment requirements. As a
result, the allocator can directly select an aligned sub-region without
splitting larger blocks unnecessarily.

A practical example is the VKCTS test
dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000, which repeatedly
allocates 8 KiB buffers with a 256 KiB alignment. Previously, such allocations
caused large blocks to be split aggressively, despite smaller aligned regions
being sufficient. With this change, those aligned regions are reused directly,
significantly reducing fragmentation.

This improvement is visible in the amdgpu VRAM buddy allocator state
(/sys/kernel/debug/dri/1/amdgpu_vram_mm). After the change, higher-order blocks
are preserved and the number of low-order fragments is substantially reduced.

Before:
   order- 5 free: 1936 MiB, blocks: 15490
   order- 4 free:  967 MiB, blocks: 15486
   order- 3 free:  483 MiB, blocks: 15485
   order- 2 free:  241 MiB, blocks: 15486
   order- 1 free:  241 MiB, blocks: 30948

After:
   order- 5 free:  493 MiB, blocks:  3941
   order- 4 free:  246 MiB, blocks:  3943
   order- 3 free:  123 MiB, blocks:  4101
   order- 2 free:   61 MiB, blocks:  4101
   order- 1 free:   61 MiB, blocks:  8018

By avoiding unnecessary splits, this change improves allocator efficiency and
helps maintain larger contiguous free regions under heavy offset-aligned
allocation workloads.

v2:(Matthew)
   - Update augmented information along the path to the inserted node.

v3:
   - Move the patch to gpu/buddy.c file.

v4:(Matthew)
   - Use the helper instead of calling _ffs directly
   - Remove gpu_buddy_block_order(block) >= order check and drop order
   - Drop !node check as all callers handle this already
   - Return larger than any other possible alignment for __ffs64(0)
   - Replace __ffs with __ffs64

Signed-off-by: Arunpravin Paneer Selvam <[email protected]>
Suggested-by: Christian König <[email protected]>

Reviewed-by: Matthew Auld <[email protected]>

Reply via email to