AMD General
> -----Original Message-----
> From: Koenig, Christian <[email protected]>
> Sent: Monday, May 4, 2026 5:01 PM
> To: Alex Deucher <[email protected]>; Zhang, Jesse(Jie)
> <[email protected]>
> Cc: [email protected]; Deucher, Alexander
> <[email protected]>
> Subject: Re: [PATCH v4 10/10] drm/amdgpu/userq_fence: NOTIFY MES on SDMA
> UMQ submit
>
> On 5/1/26 15:30, Alex Deucher wrote:
> > On Thu, Apr 30, 2026 at 12:29 PM Jesse Zhang <[email protected]>
> wrote:
> >>
> >> From: "Jesse.zhang" <[email protected]>
> >>
> >> Pair the userspace aggregated-doorbell ring (added by the
> >> AMDGPU_INFO_DOORBELL /
> AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL ABI in
> >> the previous patches) with a kernel-side
> >> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE in
> >> amdgpu_userq_signal_ioctl for SDMA UMQs.
> >>
> >> Signed-off-by: Jesse Zhang <[email protected]>
> >
> > How will this work if the user doesn't use this IOCTL? protected
> > fences are optional. An application can create a user queue and never
> > use a protected fence. Why don't KFD SDMA queues need this special
> > treatment?
>
> Yeah agree that whole approach doesn't work.
>
> What we could do is similar to the MM queues that userspace need to signal
> both a
> per queue doorbell and an aggregated one for the queue type.
>
> Regards,
> Christian.
Hi Christian, Alex,
Agreed, and will drop this patch.
The MM-style userspace ABI is already in place: David's agdb_bo
(AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL + GEM_OP_OPEN_GLOBAL) plus patch 9
(AMDGPU_INFO_DOORBELL reports the SDMA agdb slot). IGT rings per-queue +
aggregated on every submit.
The remaining gap: on MES12 , a bare agg_db ring does NOT wake an
unmapped SDMA UMQ — MES needs hasReadyQueues set, which today only
NOTIFY_WORK_ON_UNMAPPED_QUEUE flips. This is by design, not Linux-only.
The Windows UMQ path also uses the same
contract — MES writes 1 to *unmap_flag_addr on preempt; UMD checks the
flag and calls NOTIFY before ringing doorbells on the next submit.
Next version v5 (matches Windows):
- Drop patch 10.
- Keep David's ABI + INFO_DOORBELL.
- Add a small standalone NOTIFY ioctl (e.g. AMDGPU_USERQ_OP_NOTIFY_WORK)
so UMQ apps call it on demand.
Is it the right direction?
Attached test results (current v4, with the to-be-dropped signal_ioctl NOTIFY):
HW / fw : gfx12
Test : IGT amd_userq_sdma stress, 100 iters
Result : 100/100 PASS
So the agg_db + NOTIFY mechanism works on hardware.
Thanks,
Jesse
>
> >
> > Alex
> >
> >> ---
> >> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 29
> +++++++++++++++++++
> >> 1 file changed, 29 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> index a58342c2ac44..50e275b51c9e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> >> @@ -598,6 +598,35 @@ int amdgpu_userq_signal_ioctl(struct drm_device
> *dev, void *data,
> >> /* drop the reference acquired in fence creation function */
> >> dma_fence_put(fence);
> >>
> >> + /*
> >> + * SDMA UMQ wake: SDMA has no CP_UNMAPPED_DOORBELL HW
> intercept, so
> >> + * once MES gangs the queue out (after the first IB's
> PROTECTED_FENCE
> >> + * idles the queue), per-queue doorbell rings hit a mapped-out HW
> >> + * slot and are silently dropped — FENCE IRQ never fires.
> >> + *
> >> + * Userspace rings the priority's MES aggregated doorbell directly
> >> + * via the agdb_bo mmap (see AMDGPU_INFO_DOORBELL +
> >> + * AMDGPU_GEM_GLOBAL_AGGREGATED_DOORBELL). That alone,
> however, is
> >> + * not enough on current MES12 firmware — MES will not scan the
> >> + * priority's queue list unless its hasReadyQueues flag is set.
> >> + * NOTIFY_WORK_ON_UNMAPPED_QUEUE flips that flag, so MES
> then
> >> + * processes the doorbell ring and re-MAP_QUEUEs the SDMA UMQ.
> >> + *
> >> + * This is a kernel-side companion to the userspace agg doorbell
> >> + * ring; remove once firmware learns to wake on bare aggregated
> >> + * doorbell.
> >> + */
> >> + if (queue && queue->queue_type == AMDGPU_HW_IP_DMA &&
> >> + adev->enable_mes && adev->mes.funcs->misc_op) {
> >> + struct mes_misc_op_input op = { 0 };
> >> +
> >> + op.op =
> MES_MISC_OP_NOTIFY_WORK_ON_UNMAPPED_QUEUE;
> >> + op.notify_work.priority_level =
> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> >> + amdgpu_mes_lock(&adev->mes);
> >> + (void)adev->mes.funcs->misc_op(&adev->mes, &op);
> >> + amdgpu_mes_unlock(&adev->mes);
> >> + }
> >> +
> >> exec_fini:
> >> drm_exec_fini(&exec);
> >> put_gobj_write:
> >> --
> >> 2.49.0
> >>