On 3/23/26 10:44, Liang, Prike wrote:
> [Public]
> 
> Regards,
>       Prike
> 
>> -----Original Message-----
>> From: Koenig, Christian <[email protected]>
>> Sent: Friday, March 20, 2026 5:45 PM
>> To: Liang, Prike <[email protected]>; [email protected]
>> Cc: Deucher, Alexander <[email protected]>
>> Subject: Re: [PATCH] drm/amdgpu/userq: fix the old userq fence dereference
>>
>>
>>
>> On 3/20/26 09:41, Prike Liang wrote:
>>> It needs to drop the old userq fence dereference, otherwise it can't
>>> be walked during freeing the queue.
>>>
>>> Signed-off-by: Prike Liang <[email protected]>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> index 9df62f71f27c..f1acd785c27d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> @@ -105,9 +105,21 @@ int amdgpu_userq_fence_driver_alloc(struct
>> amdgpu_device *adev,
>>>     get_task_comm(fence_drv->timeline_name, current);
>>>
>>>     xa_lock_irqsave(&adev->userq_xa, flags);
>>> -   r = xa_err(__xa_store(&adev->userq_xa, userq->doorbell_index,
>>> -                         fence_drv, GFP_KERNEL));
>>> +   struct amdgpu_userq_fence_driver *old;
>>> +
>>> +   old = __xa_store(&adev->userq_xa, userq->doorbell_index,
>>> +                         fence_drv, GFP_KERNEL);
>>>     xa_unlock_irqrestore(&adev->userq_xa, flags);
>>> +
>>> +   if (xa_is_err(old)) {
>>> +           r = xa_err(old);
>>> +   } else if (old) {
>>> +           /* Doorbell index was reused: drop the replaced driver's ref */
>>> +           amdgpu_userq_fence_driver_put(old);
>>
>> What? Why is a doorbell index re-used while there is still an userq fence 
>> driver for it
>> around?
> As to the current userq doorbell index calculation there sometimes can have a 
> same doorbell index for the two different userq during each queue creation 
> process. There may need to handle the old fence which hasn't been signaled 
> when a new fence stored at the same doorbell index.

Ah, I see the problem.

The issue is that amdgpu_userq_fence_driver_alloc() and 
amdgpu_userq_fence_driver_destroy() registers with the adev->userq_xa.

That is not only really inefficient, but also leads to this bug here.

Instead amdgpu_userq_create() and amdgpu_userq_destroy() needs to do that.

Additional to those problems the reference counting for userq->fence_drv seems 
to be completely messed up. It should be inside amdgpu_userq_destroy(), but I 
can't see were that one is currently dropped.

Thanks,
Christian.

> 
>> That doesn't make to much sense.
>>
>> Regards,
>> Christian.
>>
>>> +           r = 0;
>>> +   } else {
>>> +           r = 0;
>>> +   }
>>>     if (r)
>>>             goto free_seq64;
>>>
> 

Reply via email to