On 5/27/26 09:55, Liang, Prike wrote:
> AMD General
> 
> Regards,
>       Prike
> 
>> -----Original Message-----
>> From: Koenig, Christian <[email protected]>
>> Sent: Tuesday, May 26, 2026 6:48 PM
>> To: Liang, Prike <[email protected]>; [email protected]
>> Cc: Deucher, Alexander <[email protected]>
>> Subject: Re: [PATCH 1/3] drm/amdgpu: avoid extracting fence_drv_array for 
>> empty
>> wait fences
>>
>>
>>
>> On 5/26/26 11:32, Prike Liang wrote:
>>> Avoid xarray extraction and temporary array allocation in
>>> amdgpu_userq_fence_alloc() when there are no pending wait-side fence
>>> driver references. This keeps the common fence emit path cheaper and
>>> efficient.
>>
>> That's an absolute corner case we clearly don't need to optimize for.
>>
>> In almost all cases we should have at least one remote fence driver here.
> 
> When only the desktop compositor is running, there're many no-wait fences are 
> generated while emitting userq fences.

That sounds like a bug to me. In almost all cases we should have always at 
least one wait fence in here.

Otherwise the synchronization between X/Wayland and rendering client isn't 
working properly.

Can you investigate why we don't have a fence dependency here?

What could be is that we filter out that dependency in the wait IOCTL because 
it is already signaled.

Regards,
Christian.

> Repeatedly attempting to extract the wait fence array takes more than 10µs 
> (with a maximum cost of around 30µs). Additionally, zero-initializing the 
> userq fence allocation can help reduce overhead in the userq fence put 
> routine.
> 
> This patch can return a userq fence driver even when falling back from an 
> empty fence_drv_xa, benefiting on reducing the latency of userq fence driver 
> extraction and free operations when there is no pending wait-side fence.
> 
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Prike Liang <[email protected]>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> index 008330a0d852..2a2bf13a513d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> @@ -226,7 +226,7 @@ static int amdgpu_userq_fence_alloc(struct
>> amdgpu_usermode_queue *userq,
>>>     struct amdgpu_userq_fence *userq_fence;
>>>     void *entry;
>>>
>>> -   userq_fence = kmalloc(sizeof(*userq_fence), GFP_KERNEL);
>>> +   userq_fence = kzalloc(sizeof(*userq_fence), GFP_KERNEL);
>>>     if (!userq_fence)
>>>             return -ENOMEM;
>>>
>>> @@ -235,6 +235,8 @@ static int amdgpu_userq_fence_alloc(struct
>> amdgpu_usermode_queue *userq,
>>>      * used as size to allocate the array.
>>>      */
>>>     mutex_lock(&userq->fence_drv_lock);
>>> +   if (xa_empty(&userq->fence_drv_xa))
>>> +           goto unlock;
>>>     XA_STATE(xas, &userq->fence_drv_xa, 0);
>>>
>>>     rcu_read_lock();
>>> @@ -256,7 +258,7 @@ static int amdgpu_userq_fence_alloc(struct
>> amdgpu_usermode_queue *userq,
>>>     xa_extract(&userq->fence_drv_xa, (void **)userq_fence->fence_drv_array,
>>>                0, ULONG_MAX, xas.xa_index, XA_PRESENT);
>>>     xa_destroy(&userq->fence_drv_xa);
>>> -
>>> +unlock:
>>>     mutex_unlock(&userq->fence_drv_lock);
>>>
>>>     amdgpu_userq_fence_driver_get(fence_drv);
> 

Reply via email to