drmm_cgroup_register_region() is called before INIT_LIST_HEAD() and
gpu_buddy_init() in amdgpu_vram_mgr_init(). If it fails, the function
returns early and bypasses those initializations.

Since adev->mman.initialized is set to true before amdgpu_vram_mgr_init()
is called, a failure triggers amdgpu_ttm_fini(), which calls
amdgpu_vram_mgr_fini(), which then:

 - Calls list_for_each_entry_safe() on reservations_pending and
   reserved_pages, whose list_head::next pointers are zero-initialized
   (NULL). The loop does not recognize them as empty and dereferences NULL.

 - Calls gpu_buddy_fini(), which iterates free_trees[] unconditionally
   via for_each_free_tree(). Since mm->free_trees is NULL
   (never allocated), this dereferences NULL.

Both result in a kernel panic on the module load error path.

Fix by moving drmm_cgroup_register_region() to after the list and buddy
allocator are fully initialized, so the teardown path is safe to run.

Reported-by: Sashiko-bot <[email protected]>
Closes: 
https://sashiko.dev/#/patchset/[email protected]?part=4
Fixes: 2b624a2c1865 ("drm/ttm: Handle cgroup based eviction in TTM")
Cc: Friedrich Vock <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # v6.14+
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <[email protected]>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 2a241a5b12c4..ac3f71d77140 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -918,9 +918,6 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
        struct ttm_resource_manager *man = &mgr->manager;
        int err;
 
-       man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", 
adev->gmc.real_vram_size);
-       if (IS_ERR(man->cg))
-               return PTR_ERR(man->cg);
        ttm_resource_manager_init(man, &adev->mman.bdev,
                                  adev->gmc.real_vram_size);
 
@@ -935,6 +932,10 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
        if (err)
                return err;
 
+       man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", 
adev->gmc.real_vram_size);
+       if (IS_ERR(man->cg))
+               return PTR_ERR(man->cg);
+
        ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager);
        ttm_resource_manager_set_used(man, true);
        return 0;
-- 
2.54.0

Reply via email to