On 2/14/2025 4:08 PM, Christian König wrote:
Adding Arvind, please make sure to keep him in the loop.

Am 14.02.25 um 11:07 schrieb Le Ma:
On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
will show up explicitly during driver unloading if created bo without
drm_timeline object before.

     BUG drm_sched_fence (Tainted: G           OE     ): Objects remaining in 
drm_sched_fence on __kmem_cache_shutdown()
     
-----------------------------------------------------------------------------
     Call Trace:
     <TASK>
     dump_stack_lvl+0x4c/0x70
     dump_stack+0x14/0x20
     slab_err+0xb0/0xf0
     ? srso_alias_return_thunk+0x5/0xfbef5
     ? flush_work+0x12/0x20
     ? srso_alias_return_thunk+0x5/0xfbef5
     __kmem_cache_shutdown+0x163/0x2e0
     kmem_cache_destroy+0x61/0x170
     drm_sched_fence_slab_fini+0x19/0x900

Thus call dma_fence_put properly to avoid the memleak.

v2: call dma_fence_put in amdgpu_gem_va_update_vm

Signed-off-by: Le Ma <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++++++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8b67aae6c2fe..00f1f34705c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -759,7 +759,8 @@ static struct dma_fence *
  amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
                        struct amdgpu_vm *vm,
                        struct amdgpu_bo_va *bo_va,
-                       uint32_t operation)
+                       uint32_t operation,
+                       uint32_t syncobj_handle)
  {
        struct dma_fence *fence = dma_fence_get_stub();
        int r;
@@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
        if (r)
                goto error;
+ if (!syncobj_handle)
+               dma_fence_put(fence);
+
Having that check inside amdgpu_gem_update_bo_mapping() was actually correct. 
Here it doesn't make much sense.

Agreed,

Regards,
~Arvind


        if (operation == AMDGPU_VA_OP_MAP ||
            operation == AMDGPU_VA_OP_REPLACE) {
                r = amdgpu_vm_bo_update(adev, bo_va, false);
@@ -965,7 +969,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
                                                    &timeline_chain);
Right before this here is a call to amdgpu_gem_update_timeline_node() which is 
incorrectly placed.

That needs to come much earlier, above the switch (args->operation)....

Regards,
Christian.

fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
-                                               args->operation);
+                                               args->operation,
+                                               args->vm_timeline_syncobj_out);
if (!r)
                        amdgpu_gem_update_bo_mapping(filp, bo_va,

Reply via email to