AMD General

Regards,
      Prike

> -----Original Message-----
> From: Koenig, Christian <[email protected]>
> Sent: Wednesday, May 27, 2026 3:59 PM
> To: Liang, Prike <[email protected]>; [email protected]
> Cc: Deucher, Alexander <[email protected]>
> Subject: Re: [PATCH 1/3] drm/amdgpu: avoid extracting fence_drv_array for 
> empty
> wait fences
>
>
>
> On 5/27/26 09:55, Liang, Prike wrote:
> > AMD General
> >
> > Regards,
> >       Prike
> >
> >> -----Original Message-----
> >> From: Koenig, Christian <[email protected]>
> >> Sent: Tuesday, May 26, 2026 6:48 PM
> >> To: Liang, Prike <[email protected]>; [email protected]
> >> Cc: Deucher, Alexander <[email protected]>
> >> Subject: Re: [PATCH 1/3] drm/amdgpu: avoid extracting fence_drv_array
> >> for empty wait fences
> >>
> >>
> >>
> >> On 5/26/26 11:32, Prike Liang wrote:
> >>> Avoid xarray extraction and temporary array allocation in
> >>> amdgpu_userq_fence_alloc() when there are no pending wait-side fence
> >>> driver references. This keeps the common fence emit path cheaper and
> >>> efficient.
> >>
> >> That's an absolute corner case we clearly don't need to optimize for.
> >>
> >> In almost all cases we should have at least one remote fence driver here.
> >
> > When only the desktop compositor is running, there're many no-wait fences 
> > are
> generated while emitting userq fences.
>
> That sounds like a bug to me. In almost all cases we should have always at 
> least
> one wait fence in here.
>
> Otherwise the synchronization between X/Wayland and rendering client isn't 
> working
> properly.
>
> Can you investigate why we don't have a fence dependency here?
>
> What could be is that we filter out that dependency in the wait IOCTL because 
> it is
> already signaled.

When only the desktop compositor is running, the wait fence ioctl gathers only 
the VM timeline
fence. No userq fences from generic syncobjs or BO implicit sync are present, 
so no userq fence
driver is referenced during the wait path. In this case, skipping the 
allocation and extraction of the
userq fence driver would reduce the per submission overhead of emitting userq 
fences.

>
> Regards,
> Christian.
>
> > Repeatedly attempting to extract the wait fence array takes more than 10µs 
> > (with a
> maximum cost of around 30µs). Additionally, zero-initializing the userq fence
> allocation can help reduce overhead in the userq fence put routine.
> >
> > This patch can return a userq fence driver even when falling back from an 
> > empty
> fence_drv_xa, benefiting on reducing the latency of userq fence driver 
> extraction and
> free operations when there is no pending wait-side fence.
> >
> >> Regards,
> >> Christian.
> >>
> >>>
> >>> Signed-off-by: Prike Liang <[email protected]>
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 ++++--
> >>>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >>> index 008330a0d852..2a2bf13a513d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >>> @@ -226,7 +226,7 @@ static int amdgpu_userq_fence_alloc(struct
> >> amdgpu_usermode_queue *userq,
> >>>     struct amdgpu_userq_fence *userq_fence;
> >>>     void *entry;
> >>>
> >>> -   userq_fence = kmalloc(sizeof(*userq_fence), GFP_KERNEL);
> >>> +   userq_fence = kzalloc(sizeof(*userq_fence), GFP_KERNEL);
> >>>     if (!userq_fence)
> >>>             return -ENOMEM;
> >>>
> >>> @@ -235,6 +235,8 @@ static int amdgpu_userq_fence_alloc(struct
> >> amdgpu_usermode_queue *userq,
> >>>      * used as size to allocate the array.
> >>>      */
> >>>     mutex_lock(&userq->fence_drv_lock);
> >>> +   if (xa_empty(&userq->fence_drv_xa))
> >>> +           goto unlock;
> >>>     XA_STATE(xas, &userq->fence_drv_xa, 0);
> >>>
> >>>     rcu_read_lock();
> >>> @@ -256,7 +258,7 @@ static int amdgpu_userq_fence_alloc(struct
> >> amdgpu_usermode_queue *userq,
> >>>     xa_extract(&userq->fence_drv_xa, (void 
> >>> **)userq_fence->fence_drv_array,
> >>>                0, ULONG_MAX, xas.xa_index, XA_PRESENT);
> >>>     xa_destroy(&userq->fence_drv_xa);
> >>> -
> >>> +unlock:
> >>>     mutex_unlock(&userq->fence_drv_lock);
> >>>
> >>>     amdgpu_userq_fence_driver_get(fence_drv);
> >

Reply via email to