On 6/11/26 16:37, [email protected] wrote:
Thank you for your contribution! Sashiko AI review found 1 potential issue(s)
to consider:
Pre-existing issues:
- [High] Incomplete Fix / Kernel panic on module load error path: If `gpu_buddy_init()` fails,
it cleans up its internal allocations but leaves `mm->n_roots > 0` and
`mm->free_trees` and `mm->roots` as dangling or NULL pointers. Because
`adev->mman.initialized` is set to `true` prior to calling `amdgpu_vram_mgr_init()`, a
failure in `gpu_buddy_init()` unconditionally triggers the driver teardown path.
`amdgpu_vram_mgr_fini()` then invokes `gpu_buddy_fini()`, which dereferences the invalid
pointers and causes a kernel panic.
Pre-existing issues are out of scope for this series, except when they
are needed for the new functionality.
/Thomas
--
commit 88a159aad4fc080127cdbd3bc127dfaae0ae9c17
Author: Thomas Hellström <[email protected]>
drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
This commit fixes kernel panics on the module load error path by moving
drmm_cgroup_register_region() after the list and buddy allocator are fully
initialized.
Closes:
https://sashiko.dev/#/patchset/[email protected]?part=4
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 2a241a5b12c42..ac3f71d77140a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
[ ... ]
@@ -935,6 +932,10 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
if (err)
return err;
[Severity: High]
This is a pre-existing issue, but if gpu_buddy_init() fails, does it still
leave the system vulnerable to the same module load error path panic?
Looking at drivers/gpu/buddy.c, if gpu_buddy_init() fails after setting
mm->n_roots, the error path cleans up allocations but leaves mm->n_roots
populated and mm->roots as a dangling or NULL pointer:
drivers/gpu/buddy.c:gpu_buddy_init() {
...
mm->n_roots = hweight64(size);
...
out_free_tree:
while (i--)
kfree(mm->free_trees[i]);
kfree(mm->free_trees);
return -ENOMEM;
}
Since adev->mman.initialized is set to true before amdgpu_vram_mgr_init() is
called, returning this error will trigger the teardown path where
amdgpu_vram_mgr_fini() unconditionally calls gpu_buddy_fini():
drivers/gpu/buddy.c:gpu_buddy_fini() {
...
for (i = 0; i < mm->n_roots; ++i) {
start = gpu_buddy_block_offset(mm->roots[i]);
...
}
Because mm->n_roots is non-zero, will this unconditionally dereference the
invalid mm->roots pointer and cause a panic?
+ man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram",
adev->gmc.real_vram_size);
+ if (IS_ERR(man->cg))
+ return PTR_ERR(man->cg);
+
ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager);
ttm_resource_manager_set_used(man, true);
return 0;