On 5/27/26 09:55, Liang, Prike wrote:
> AMD General
>
> Regards,
> Prike
>
>> -----Original Message-----
>> From: Koenig, Christian <[email protected]>
>> Sent: Tuesday, May 26, 2026 6:48 PM
>> To: Liang, Prike <[email protected]>; [email protected]
>> Cc: Deucher, Alexander <[email protected]>
>> Subject: Re: [PATCH 1/3] drm/amdgpu: avoid extracting fence_drv_array for
>> empty
>> wait fences
>>
>>
>>
>> On 5/26/26 11:32, Prike Liang wrote:
>>> Avoid xarray extraction and temporary array allocation in
>>> amdgpu_userq_fence_alloc() when there are no pending wait-side fence
>>> driver references. This keeps the common fence emit path cheaper and
>>> efficient.
>>
>> That's an absolute corner case we clearly don't need to optimize for.
>>
>> In almost all cases we should have at least one remote fence driver here.
>
> When only the desktop compositor is running, there're many no-wait fences are
> generated while emitting userq fences.
That sounds like a bug to me. In almost all cases we should have always at
least one wait fence in here.
Otherwise the synchronization between X/Wayland and rendering client isn't
working properly.
Can you investigate why we don't have a fence dependency here?
What could be is that we filter out that dependency in the wait IOCTL because
it is already signaled.
Regards,
Christian.
> Repeatedly attempting to extract the wait fence array takes more than 10µs
> (with a maximum cost of around 30µs). Additionally, zero-initializing the
> userq fence allocation can help reduce overhead in the userq fence put
> routine.
>
> This patch can return a userq fence driver even when falling back from an
> empty fence_drv_xa, benefiting on reducing the latency of userq fence driver
> extraction and free operations when there is no pending wait-side fence.
>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Prike Liang <[email protected]>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> index 008330a0d852..2a2bf13a513d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> @@ -226,7 +226,7 @@ static int amdgpu_userq_fence_alloc(struct
>> amdgpu_usermode_queue *userq,
>>> struct amdgpu_userq_fence *userq_fence;
>>> void *entry;
>>>
>>> - userq_fence = kmalloc(sizeof(*userq_fence), GFP_KERNEL);
>>> + userq_fence = kzalloc(sizeof(*userq_fence), GFP_KERNEL);
>>> if (!userq_fence)
>>> return -ENOMEM;
>>>
>>> @@ -235,6 +235,8 @@ static int amdgpu_userq_fence_alloc(struct
>> amdgpu_usermode_queue *userq,
>>> * used as size to allocate the array.
>>> */
>>> mutex_lock(&userq->fence_drv_lock);
>>> + if (xa_empty(&userq->fence_drv_xa))
>>> + goto unlock;
>>> XA_STATE(xas, &userq->fence_drv_xa, 0);
>>>
>>> rcu_read_lock();
>>> @@ -256,7 +258,7 @@ static int amdgpu_userq_fence_alloc(struct
>> amdgpu_usermode_queue *userq,
>>> xa_extract(&userq->fence_drv_xa, (void **)userq_fence->fence_drv_array,
>>> 0, ULONG_MAX, xas.xa_index, XA_PRESENT);
>>> xa_destroy(&userq->fence_drv_xa);
>>> -
>>> +unlock:
>>> mutex_unlock(&userq->fence_drv_lock);
>>>
>>> amdgpu_userq_fence_driver_get(fence_drv);
>