Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [Critical] The newly registered dmem cgroup region is never assigned to `mgr->cg_region`, causing a memory leak and leaving dangling reclaim callbacks that lead to a Use-After-Free. - [High] Calling `dmem_cgroup_unregister_region()` before `ttm_resource_manager_set_used(man, false)` creates a race window where concurrent allocations can dereference a freed cgroup region. --
commit 02aedad5c82131466a2fb9b6bba851f053487de8 Author: Thomas Hellström <[email protected]> drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager Register the VRAM manager with the dmem cgroup reclaim infrastructure so that lowering dmem.max below current VRAM usage triggers TTM eviction rather than failing with -EBUSY. > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > index 08f05c3aed1d5..ee98b963e84a0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > @@ -933,12 +938,15 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) > if (err) > return err; > > - man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", > - &(struct dmem_cgroup_init){ > - .size = > adev->gmc.real_vram_size, > - }); > - if (IS_ERR(man->cg)) > - return PTR_ERR(man->cg); > + cg = dmem_cgroup_register_region(&(struct dmem_cgroup_init){ > + .size = adev->gmc.real_vram_size, > + .ops = &amdgpu_vram_mgr_dmem_ops, > + .reclaim_priv = man, > + }, "vram"); > + if (IS_ERR(cg)) > + return PTR_ERR(cg); > + > + ttm_resource_manager_set_dmem_region(man, cg); [Severity: Critical] Is cg missing an assignment to mgr->cg_region here? Because mgr->cg_region is zero-initialized and never updated, amdgpu_vram_mgr_fini() will later pass NULL to dmem_cgroup_unregister_region(). This causes the cgroup region to leak and remain on the global list. If cgroup reclaim is triggered after the driver unbinds and the man pointer is freed, will the callback execute and dereference reclaim_priv, resulting in a use-after-free? > ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager); > ttm_resource_manager_set_used(man, true); [ ... ] > @@ -960,6 +968,15 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) > int ret; > struct amdgpu_vram_reservation *rsv, *temp; > > + /* > + * Drain any in-flight dmem cgroup reclaim callbacks and remove the > + * region from the global list before tearing down the manager. > + * This must happen first so no reclaim callback can access the > + * manager after this point. > + */ > + dmem_cgroup_unregister_region(mgr->cg_region); > + mgr->cg_region = NULL; > + > ttm_resource_manager_set_used(man, false); [Severity: High] Does unregistering the cgroup region before calling ttm_resource_manager_set_used(man, false) create a race condition with concurrent BO allocations? Assuming mgr->cg_region was properly set, dmem_cgroup_unregister_region() will remove the region from global lists and drop its references. Meanwhile, the manager remains active (man->use_type == true). Because amdgpu_vram_mgr_fini() runs during driver unbind where concurrent ioctls may still be executing, could a concurrent BO allocation pass the check in ttm_bo_mem_space() and call dmem_cgroup_try_charge(man->cg, ...)? This would access the unregistered region and cause a use-after-free. Should ttm_resource_manager_set_used(man, false) be executed before unregistering the region? -- Sashiko AI review · https://sashiko.dev/#/patchset/[email protected]?part=6
