On Mon, Nov 3, 2025 at 5:51 AM Christian König <[email protected]> wrote:
>
> On 10/31/25 16:28, Alex Deucher wrote:
> > On Fri, Oct 31, 2025 at 10:01 AM Christian König
> > <[email protected]> wrote:
> >>
> >> On 10/31/25 14:53, Alex Deucher wrote:
> >>> On Fri, Oct 31, 2025 at 4:40 AM Christian König
> >>> <[email protected]> wrote:
> >>>>
> >>>> On 10/27/25 23:02, Alex Deucher wrote:
> >>>>> If we don't end up initializing the fences, free them when
> >>>>> we free the job.
> >>>>>
> >>>>> v2: take a reference to the fences if we emit them
> >>>>>
> >>>>> Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling")
> >>>>> Reviewed-by: Jesse Zhang <[email protected]> (v1)
> >>>>> Signed-off-by: Alex Deucher <[email protected]>
> >>>>> ---
> >>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  |  2 ++
> >>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 18 ++++++++++++++++++
> >>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  2 ++
> >>>>>  3 files changed, 22 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
> >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >>>>> index 39229ece83f83..0596114377600 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >>>>> @@ -302,6 +302,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, 
> >>>>> unsigned int num_ibs,
> >>>>>               return r;
> >>>>>       }
> >>>>>       *f = &af->base;
> >>>>> +     /* get a ref for the job */
> >>>>> +     dma_fence_get(*f);
> >>>>
> >>>> I think it would be better to set the fence inside the job to NULL as 
> >>>> soon as it is consumed/initialized.
> >>>
> >>> We need the pointer for the job timed out handling.
> >>
> >> I don't think that is true. During a timeout we should have 
> >> job->s_fence->parent for the HW fence.
> >
> > We also need to keep it around for job_submit_direct() so we can free
> > the IBs used for that.
>
> Good point, but that handling here is really not straight forward.
>
> Anyway feel free to add my rb for now, but we need to re-visite that at some 
> point.

Thanks.  I found a leak of the non-job fence.  Please see the latest
revision of the patch.

Alex

>
> Regards,
> Christian.
>
> >
> > Alex
> >
> >>
> >> But even when we go down that route here, you only grab a reference to the 
> >> hw_fence but not the hw_vm_fence.
> >>
> >> That looks broken to me.
> >>
> >> Christian.
> >>
> >>>
> >>> Alex
> >>>
> >>>>
> >>>>>
> >>>>>       if (ring->funcs->insert_end)
> >>>>>               ring->funcs->insert_end(ring);
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> index 55c7e104d5ca0..dc970f5fe601b 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> @@ -295,6 +295,15 @@ static void amdgpu_job_free_cb(struct 
> >>>>> drm_sched_job *s_job)
> >>>>>
> >>>>>       amdgpu_sync_free(&job->explicit_sync);
> >>>>>
> >>>>> +     if (job->hw_fence->base.ops)
> >>>>> +             dma_fence_put(&job->hw_fence->base);
> >>>>> +     else
> >>>>> +             kfree(job->hw_fence);
> >>>>> +     if (job->hw_vm_fence->base.ops)
> >>>>> +             dma_fence_put(&job->hw_vm_fence->base);
> >>>>> +     else
> >>>>> +             kfree(job->hw_vm_fence);
> >>>>> +
> >>>>
> >>>> This way that here can just be a kfree(..).
> >>>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>>       kfree(job);
> >>>>>  }
> >>>>>
> >>>>> @@ -324,6 +333,15 @@ void amdgpu_job_free(struct amdgpu_job *job)
> >>>>>       if (job->gang_submit != &job->base.s_fence->scheduled)
> >>>>>               dma_fence_put(job->gang_submit);
> >>>>>
> >>>>> +     if (job->hw_fence->base.ops)
> >>>>> +             dma_fence_put(&job->hw_fence->base);
> >>>>> +     else
> >>>>> +             kfree(job->hw_fence);
> >>>>> +     if (job->hw_vm_fence->base.ops)
> >>>>> +             dma_fence_put(&job->hw_vm_fence->base);
> >>>>> +     else
> >>>>> +             kfree(job->hw_vm_fence);
> >>>>> +
> >>>>>       kfree(job);
> >>>>>  }
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >>>>> index db66b4232de02..f8c67840f446f 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >>>>> @@ -845,6 +845,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, 
> >>>>> struct amdgpu_job *job,
> >>>>>               if (r)
> >>>>>                       return r;
> >>>>>               fence = &job->hw_vm_fence->base;
> >>>>> +             /* get a ref for the job */
> >>>>> +             dma_fence_get(fence);
> >>>>>       }
> >>>>>
> >>>>>       if (vm_flush_needed) {
> >>>>
> >>
>

Reply via email to