drmm_cgroup_register_region() is called before INIT_LIST_HEAD() and gpu_buddy_init() in amdgpu_vram_mgr_init(). If it fails, the function returns early and bypasses those initializations.
Since adev->mman.initialized is set to true before amdgpu_vram_mgr_init() is called, a failure triggers amdgpu_ttm_fini(), which calls amdgpu_vram_mgr_fini(), which then: - Calls list_for_each_entry_safe() on reservations_pending and reserved_pages, whose list_head::next pointers are zero-initialized (NULL). The loop does not recognize them as empty and dereferences NULL. - Calls gpu_buddy_fini(), which iterates free_trees[] unconditionally via for_each_free_tree(). Since mm->free_trees is NULL (never allocated), this dereferences NULL. Both result in a kernel panic on the module load error path. Fix by moving drmm_cgroup_register_region() to after the list and buddy allocator are fully initialized, so the teardown path is safe to run. Reported-by: Sashiko-bot <[email protected]> Closes: https://sashiko.dev/#/patchset/[email protected]?part=4 Fixes: 2b624a2c1865 ("drm/ttm: Handle cgroup based eviction in TTM") Cc: Friedrich Vock <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Cc: [email protected] Cc: <[email protected]> # v6.14+ Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellström <[email protected]> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 2a241a5b12c4..ac3f71d77140 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -918,9 +918,6 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) struct ttm_resource_manager *man = &mgr->manager; int err; - man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->gmc.real_vram_size); - if (IS_ERR(man->cg)) - return PTR_ERR(man->cg); ttm_resource_manager_init(man, &adev->mman.bdev, adev->gmc.real_vram_size); @@ -935,6 +932,10 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) if (err) return err; + man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->gmc.real_vram_size); + if (IS_ERR(man->cg)) + return PTR_ERR(man->cg); + ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager); ttm_resource_manager_set_used(man, true); return 0; -- 2.54.0
