AMD General

> -----Original Message-----
> From: Koenig, Christian <[email protected]>
> Sent: Monday, May 4, 2026 5:01 PM
> To: Alex Deucher <[email protected]>; Zhang, Jesse(Jie)
> <[email protected]>
> Cc: [email protected]; Deucher, Alexander
> <[email protected]>
> Subject: Re: [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA
> UMQ submit
>
> On 5/1/26 15:30, Alex Deucher wrote:
> > On Thu, Apr 30, 2026 at 12:29 PM Jesse Zhang <[email protected]>
> wrote:
> >>
> >> From: "Jesse.zhang" <[email protected]>
> >>
> >> Pair the userspace aggregated-doorbell ring (added by the
> >> AMDGPU_INFO_DOORBELL /
> AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL ABI in
> >> the previous patches) with a kernel-side
> >> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE in
> >> amdgpu_userq_signal_ioctl for SDMA UMQs.
> >>
> >> Signed-off-by: Jesse Zhang <[email protected]>
> >
> > How will this work if the user doesn't use this IOCTL?  protected
> > fences are optional.  An application can create a user queue and never
> > use a protected fence.  Why don't KFD SDMA queues need this special
> > treatment?
>
> Yeah agree that whole approach doesn't work.
>
> What we could do is similar to the MM queues that userspace need to signal 
> both a
> per queue doorbell and an aggregated one for the queue type.
>
> Regards,
> Christian.
Hi Christian, Alex,

Agreed, and will drop this  patch.

The MM-style userspace ABI is already in place: David's agdb_bo
(AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL + GEM_OP_OPEN_GLOBAL) plus patch 9
(AMDGPU_INFO_DOORBELL reports the SDMA agdb slot).  IGT rings per-queue +
aggregated on every submit.

The remaining gap: on MES12 , a bare agg_db ring does NOT wake an
unmapped SDMA UMQ — MES needs hasReadyQueues set, which today only
NOTIFY_WORK_ON_UNMAPPED_QUEUE flips.  This is by design, not Linux-only.
The Windows UMQ path  also uses the same
contract — MES writes 1 to *unmap_flag_addr on preempt; UMD checks the
flag and calls NOTIFY before ringing doorbells on the next submit.

Next version v5 (matches Windows):

  - Drop patch 10.
  - Keep David's ABI + INFO_DOORBELL.
  - Add a small standalone NOTIFY ioctl (e.g. AMDGPU_USERQ_OP_NOTIFY_WORK)
    so UMQ apps call it on demand.

  Is it  the right direction?

Attached test results (current v4, with the to-be-dropped signal_ioctl NOTIFY):

  HW / fw : gfx12
  Test    : IGT amd_userq_sdma stress, 100 iters
  Result  : 100/100 PASS

So the agg_db + NOTIFY mechanism works on hardware.

Thanks,
Jesse
>
> >
> > Alex
> >
> >> ---
> >>  .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 29
> +++++++++++++++++++
> >>  1 file changed, 29 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> index a58342c2ac44..50e275b51c9e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> @@ -598,6 +598,35 @@ int amdgpu_userq_signal_ioctl(struct drm_device
> *dev, void *data,
> >>         /* drop the reference acquired in fence creation function */
> >>         dma_fence_put(fence);
> >>
> >> +       /*
> >> +        * SDMA UMQ wake: SDMA has no CP_UNMAPPED_DOORBELL HW
> intercept, so
> >> +        * once MES gangs the queue out (after the first IB's
> PROTECTED_FENCE
> >> +        * idles the queue), per-queue doorbell rings hit a mapped-out HW
> >> +        * slot and are silently dropped — FENCE IRQ never fires.
> >> +        *
> >> +        * Userspace rings the priority's MES aggregated doorbell directly
> >> +        * via the agdb_bo mmap (see AMDGPU_INFO_DOORBELL +
> >> +        * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL).  That alone,
> however, is
> >> +        * not enough on current MES12 firmware — MES will not scan the
> >> +        * priority's queue list unless its hasReadyQueues flag is set.
> >> +        * NOTIFY_WORK_ON_UNMAPPED_QUEUE flips that flag, so MES
> then
> >> +        * processes the doorbell ring and re-MAP_QUEUEs the SDMA UMQ.
> >> +        *
> >> +        * This is a kernel-side companion to the userspace agg doorbell
> >> +        * ring; remove once firmware learns to wake on bare aggregated
> >> +        * doorbell.
> >> +        */
> >> +       if (queue && queue->queue_type == AMDGPU_HW_IP_DMA &&
> >> +           adev->enable_mes && adev->mes.funcs->misc_op) {
> >> +               struct mes_misc_op_input op = { 0 };
> >> +
> >> +               op.op =
> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE;
> >> +               op.notify_work.priority_level =
> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> >> +               amdgpu_mes_lock(&adev->mes);
> >> +               (void)adev->mes.funcs->misc_op(&adev->mes, &op);
> >> +               amdgpu_mes_unlock(&adev->mes);
> >> +       }
> >> +
> >>  exec_fini:
> >>         drm_exec_fini(&exec);
> >>  put_gobj_write:
> >> --
> >> 2.49.0
> >>

Reply via email to