AMD General

Regards,
      Prike

> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of Liang, 
> Prike
> Sent: Friday, May 29, 2026 4:20 PM
> To: Koenig, Christian <[email protected]>; 
> [email protected]
> Cc: Deucher, Alexander <[email protected]>
> Subject: RE: [PATCH 1/3] drm/amdgpu: avoid extracting fence_drv_array for 
> empty
> wait fences
>
> AMD General
>
> Regards,
>       Prike
>
> > -----Original Message-----
> > From: Koenig, Christian <[email protected]>
> > Sent: Wednesday, May 27, 2026 3:59 PM
> > To: Liang, Prike <[email protected]>; [email protected]
> > Cc: Deucher, Alexander <[email protected]>
> > Subject: Re: [PATCH 1/3] drm/amdgpu: avoid extracting fence_drv_array
> > for empty wait fences
> >
> >
> >
> > On 5/27/26 09:55, Liang, Prike wrote:
> > > AMD General
> > >
> > > Regards,
> > >       Prike
> > >
> > >> -----Original Message-----
> > >> From: Koenig, Christian <[email protected]>
> > >> Sent: Tuesday, May 26, 2026 6:48 PM
> > >> To: Liang, Prike <[email protected]>;
> > >> [email protected]
> > >> Cc: Deucher, Alexander <[email protected]>
> > >> Subject: Re: [PATCH 1/3] drm/amdgpu: avoid extracting
> > >> fence_drv_array for empty wait fences
> > >>
> > >>
> > >>
> > >> On 5/26/26 11:32, Prike Liang wrote:
> > >>> Avoid xarray extraction and temporary array allocation in
> > >>> amdgpu_userq_fence_alloc() when there are no pending wait-side
> > >>> fence driver references. This keeps the common fence emit path
> > >>> cheaper and efficient.
> > >>
> > >> That's an absolute corner case we clearly don't need to optimize for.
> > >>
> > >> In almost all cases we should have at least one remote fence driver here.
> > >
> > > When only the desktop compositor is running, there're many no-wait
> > > fences are
> > generated while emitting userq fences.
> >
> > That sounds like a bug to me. In almost all cases we should have
> > always at least one wait fence in here.
> >
> > Otherwise the synchronization between X/Wayland and rendering client
> > isn't working properly.
> >
> > Can you investigate why we don't have a fence dependency here?
> >
> > What could be is that we filter out that dependency in the wait IOCTL
> > because it is already signaled.
>
> When only the desktop compositor is running, the wait fence ioctl gathers 
> only the
> VM timeline fence. No userq fences from generic syncobjs or BO implicit sync 
> are
> present, so no userq fence driver is referenced during the wait path. In this 
> case,
> skipping the allocation and extraction of the userq fence driver would reduce 
> the per
> submission overhead of emitting userq fences.

Hi Christian, there is a case where no userq wait fences are present except the 
VM timeline fence.
Since the VM timeline fence is a kernel side fence and is not passed back to 
userspace as a FWM packet,
would it be worthwhile to reduce the overhead by skipping the allocation of the 
userq fence storage array in this case?

> >
> > Regards,
> > Christian.
> >
> > > Repeatedly attempting to extract the wait fence array takes more
> > > than 10µs (with a
> > maximum cost of around 30µs). Additionally, zero-initializing the
> > userq fence allocation can help reduce overhead in the userq fence put 
> > routine.
> > >
> > > This patch can return a userq fence driver even when falling back
> > > from an empty
> > fence_drv_xa, benefiting on reducing the latency of userq fence driver
> > extraction and free operations when there is no pending wait-side fence.
> > >
> > >> Regards,
> > >> Christian.
> > >>
> > >>>
> > >>> Signed-off-by: Prike Liang <[email protected]>
> > >>> ---
> > >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 ++++--
> > >>>  1 file changed, 4 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> > >>> index 008330a0d852..2a2bf13a513d 100644
> > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> > >>> @@ -226,7 +226,7 @@ static int amdgpu_userq_fence_alloc(struct
> > >> amdgpu_usermode_queue *userq,
> > >>>     struct amdgpu_userq_fence *userq_fence;
> > >>>     void *entry;
> > >>>
> > >>> -   userq_fence = kmalloc(sizeof(*userq_fence), GFP_KERNEL);
> > >>> +   userq_fence = kzalloc(sizeof(*userq_fence), GFP_KERNEL);
> > >>>     if (!userq_fence)
> > >>>             return -ENOMEM;
> > >>>
> > >>> @@ -235,6 +235,8 @@ static int amdgpu_userq_fence_alloc(struct
> > >> amdgpu_usermode_queue *userq,
> > >>>      * used as size to allocate the array.
> > >>>      */
> > >>>     mutex_lock(&userq->fence_drv_lock);
> > >>> +   if (xa_empty(&userq->fence_drv_xa))
> > >>> +           goto unlock;
> > >>>     XA_STATE(xas, &userq->fence_drv_xa, 0);
> > >>>
> > >>>     rcu_read_lock();
> > >>> @@ -256,7 +258,7 @@ static int amdgpu_userq_fence_alloc(struct
> > >> amdgpu_usermode_queue *userq,
> > >>>     xa_extract(&userq->fence_drv_xa, (void 
> > >>> **)userq_fence->fence_drv_array,
> > >>>                0, ULONG_MAX, xas.xa_index, XA_PRESENT);
> > >>>     xa_destroy(&userq->fence_drv_xa);
> > >>> -
> > >>> +unlock:
> > >>>     mutex_unlock(&userq->fence_drv_lock);
> > >>>
> > >>>     amdgpu_userq_fence_driver_get(fence_drv);
> > >

Reply via email to