On 3/23/26 10:44, Liang, Prike wrote:
> [Public]
>
> Regards,
> Prike
>
>> -----Original Message-----
>> From: Koenig, Christian <[email protected]>
>> Sent: Friday, March 20, 2026 5:45 PM
>> To: Liang, Prike <[email protected]>; [email protected]
>> Cc: Deucher, Alexander <[email protected]>
>> Subject: Re: [PATCH] drm/amdgpu/userq: fix the old userq fence dereference
>>
>>
>>
>> On 3/20/26 09:41, Prike Liang wrote:
>>> It needs to drop the old userq fence dereference, otherwise it can't
>>> be walked during freeing the queue.
>>>
>>> Signed-off-by: Prike Liang <[email protected]>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 16 ++++++++++++++--
>>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> index 9df62f71f27c..f1acd785c27d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> @@ -105,9 +105,21 @@ int amdgpu_userq_fence_driver_alloc(struct
>> amdgpu_device *adev,
>>> get_task_comm(fence_drv->timeline_name, current);
>>>
>>> xa_lock_irqsave(&adev->userq_xa, flags);
>>> - r = xa_err(__xa_store(&adev->userq_xa, userq->doorbell_index,
>>> - fence_drv, GFP_KERNEL));
>>> + struct amdgpu_userq_fence_driver *old;
>>> +
>>> + old = __xa_store(&adev->userq_xa, userq->doorbell_index,
>>> + fence_drv, GFP_KERNEL);
>>> xa_unlock_irqrestore(&adev->userq_xa, flags);
>>> +
>>> + if (xa_is_err(old)) {
>>> + r = xa_err(old);
>>> + } else if (old) {
>>> + /* Doorbell index was reused: drop the replaced driver's ref */
>>> + amdgpu_userq_fence_driver_put(old);
>>
>> What? Why is a doorbell index re-used while there is still an userq fence
>> driver for it
>> around?
> As to the current userq doorbell index calculation there sometimes can have a
> same doorbell index for the two different userq during each queue creation
> process. There may need to handle the old fence which hasn't been signaled
> when a new fence stored at the same doorbell index.
Ah, I see the problem.
The issue is that amdgpu_userq_fence_driver_alloc() and
amdgpu_userq_fence_driver_destroy() registers with the adev->userq_xa.
That is not only really inefficient, but also leads to this bug here.
Instead amdgpu_userq_create() and amdgpu_userq_destroy() needs to do that.
Additional to those problems the reference counting for userq->fence_drv seems
to be completely messed up. It should be inside amdgpu_userq_destroy(), but I
can't see were that one is currently dropped.
Thanks,
Christian.
>
>> That doesn't make to much sense.
>>
>> Regards,
>> Christian.
>>
>>> + r = 0;
>>> + } else {
>>> + r = 0;
>>> + }
>>> if (r)
>>> goto free_seq64;
>>>
>