AMD General
Regards,
Prike
> -----Original Message-----
> From: Koenig, Christian <[email protected]>
> Sent: Tuesday, May 26, 2026 6:48 PM
> To: Liang, Prike <[email protected]>; [email protected]
> Cc: Deucher, Alexander <[email protected]>
> Subject: Re: [PATCH 1/3] drm/amdgpu: avoid extracting fence_drv_array for
> empty
> wait fences
>
>
>
> On 5/26/26 11:32, Prike Liang wrote:
> > Avoid xarray extraction and temporary array allocation in
> > amdgpu_userq_fence_alloc() when there are no pending wait-side fence
> > driver references. This keeps the common fence emit path cheaper and
> > efficient.
>
> That's an absolute corner case we clearly don't need to optimize for.
>
> In almost all cases we should have at least one remote fence driver here.
When only the desktop compositor is running, there're many no-wait fences are
generated while emitting userq fences. Repeatedly attempting to extract the
wait fence array takes more than 10µs (with a maximum cost of around 30µs).
Additionally, zero-initializing the userq fence allocation can help reduce
overhead in the userq fence put routine.
This patch can return a userq fence driver even when falling back from an empty
fence_drv_xa, benefiting on reducing the latency of userq fence driver
extraction and free operations when there is no pending wait-side fence.
> Regards,
> Christian.
>
> >
> > Signed-off-by: Prike Liang <[email protected]>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> > index 008330a0d852..2a2bf13a513d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> > @@ -226,7 +226,7 @@ static int amdgpu_userq_fence_alloc(struct
> amdgpu_usermode_queue *userq,
> > struct amdgpu_userq_fence *userq_fence;
> > void *entry;
> >
> > - userq_fence = kmalloc(sizeof(*userq_fence), GFP_KERNEL);
> > + userq_fence = kzalloc(sizeof(*userq_fence), GFP_KERNEL);
> > if (!userq_fence)
> > return -ENOMEM;
> >
> > @@ -235,6 +235,8 @@ static int amdgpu_userq_fence_alloc(struct
> amdgpu_usermode_queue *userq,
> > * used as size to allocate the array.
> > */
> > mutex_lock(&userq->fence_drv_lock);
> > + if (xa_empty(&userq->fence_drv_xa))
> > + goto unlock;
> > XA_STATE(xas, &userq->fence_drv_xa, 0);
> >
> > rcu_read_lock();
> > @@ -256,7 +258,7 @@ static int amdgpu_userq_fence_alloc(struct
> amdgpu_usermode_queue *userq,
> > xa_extract(&userq->fence_drv_xa, (void **)userq_fence->fence_drv_array,
> > 0, ULONG_MAX, xas.xa_index, XA_PRESENT);
> > xa_destroy(&userq->fence_drv_xa);
> > -
> > +unlock:
> > mutex_unlock(&userq->fence_drv_lock);
> >
> > amdgpu_userq_fence_driver_get(fence_drv);