A vkcts test case is triggering a case where the drm buddy allocator wastes lots of memory and performs badly:
dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000 For each memory pool type, the test will allocate 4000 8kB objects, and then will release them. The alignment request is 256kB. For each object, the allocator will select a 256kB block (to match the alignment), and then trim it to 8kB, adding lots of free entries to the free_lists of order 5 to 1. On deallocation, none of these objects will be merged with their buddy because their "clear status" is different: only the block that was handed over to the driver might come back cleared. Also since the test don't allocate much memory, the allocator don't need to force the merge process so it will repeat the same logic for each run. As a result, after the first run (which takes about 6sec), the freelists look like this: chunk_size: 4KiB, total: 16368MiB, free: 15354MiB, clear_free: 397MiB [...] order- 5 free: 1914 MiB, blocks: 15315 order- 4 free: 957 MiB, blocks: 15325 order- 3 free: 480 MiB, blocks: 15360 order- 2 free: 239 MiB, blocks: 15347 order- 1 free: 238 MiB, blocks: 30489 After the second run (19 sec): chunk_size: 4KiB, total: 16368MiB, free: 15374MiB, clear_free: 537MiB [...] order- 5 free: 3326 MiB, blocks: 26615 order- 4 free: 1663 MiB, blocks: 26619 order- 3 free: 833 MiB, blocks: 26659 order- 2 free: 416 MiB, blocks: 26643 order- 1 free: 414 MiB, blocks: 53071 list_insert_sorted is part of the problem here since it iterates over the free_list to figure out where to insert the new blocks. To fix this while keeping the clear tracking information, a new bit is exposed to drivers, allowing them to disable trimming for blocks that aren't "clear". This bit is used by amdgpu because it always returns cleared memory to drm_buddy. With this bit set, the "merge buddies on deallocation logic" can work again, and the free_list are not growing indefinitely anymore. So after a run we get: chunk_size: 4KiB, total: 16368MiB, free: 15306MiB, clear_free: 1734MiB [...] order- 5 free: 2 MiB, blocks: 17 order- 4 free: 2 MiB, blocks: 35 order- 3 free: 1 MiB, blocks: 41 order- 2 free: 656 KiB, blocks: 41 order- 1 free: 256 KiB, blocks: 32 The runtime is better (2 sec) and stable across multiple runs, and we also see that the reported "clear_free" amount is larger than without the patch. Fixes: 96950929eb23 ("drm/buddy: Implement tracking clear page feature") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-pra...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 8 ++++++++ drivers/gpu/drm/drm_buddy.c | 1 + include/drm/drm_buddy.h | 1 + 3 files changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index abdc52b0895a..dbbaa15a973e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -499,6 +499,14 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, INIT_LIST_HEAD(&vres->blocks); + /* Trimming create smaller blocks that may never be given to the driver. + * Such blocks won't be cleared until being seen by the driver, which might + * never occur (for instance UMD might request large alignment) => in such + * case, upon release of the block, the drm_buddy allocator won't merge them + * back, because their clear status is different. + */ + vres->flags = DRM_BUDDY_TRIM_IF_CLEAR; + if (place->flags & TTM_PL_FLAG_TOPDOWN) vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION; diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index a1e652b7631d..555c72abce4c 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -1092,6 +1092,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, /* Trim the allocated block to the required size */ if (!(flags & DRM_BUDDY_TRIM_DISABLE) && + (!(flags & DRM_BUDDY_TRIM_IF_CLEAR) || drm_buddy_block_is_clear(block)) && original_size != size) { struct list_head *trim_list; LIST_HEAD(temp); diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index 9689a7c5dd36..c338d03028c3 100644 --- a/include/drm/drm_buddy.h +++ b/include/drm/drm_buddy.h @@ -28,6 +28,7 @@ #define DRM_BUDDY_CLEAR_ALLOCATION BIT(3) #define DRM_BUDDY_CLEARED BIT(4) #define DRM_BUDDY_TRIM_DISABLE BIT(5) +#define DRM_BUDDY_TRIM_IF_CLEAR BIT(6) struct drm_buddy_block { #define DRM_BUDDY_HEADER_OFFSET GENMASK_ULL(63, 12) -- 2.43.0