[Public]
Regards,
Prike
> -----Original Message-----
> From: Alex Deucher <[email protected]>
> Sent: Monday, October 13, 2025 9:43 PM
> To: Liang, Prike <[email protected]>
> Cc: Deucher, Alexander <[email protected]>; amd-
> [email protected]
> Subject: Re: [PATCH 2/7] drm/amdgpu/userq: fix SDMA and compute validation
>
> On Mon, Oct 13, 2025 at 4:51 AM Liang, Prike <[email protected]> wrote:
> >
> > [Public]
> >
> > Regards,
> > Prike
> >
> > > -----Original Message-----
> > > From: amd-gfx <[email protected]> On Behalf Of
> > > Alex Deucher
> > > Sent: Saturday, October 11, 2025 5:15 AM
> > > To: [email protected]
> > > Cc: Deucher, Alexander <[email protected]>
> > > Subject: [PATCH 2/7] drm/amdgpu/userq: fix SDMA and compute
> > > validation
> > >
> > > The CSA and EOP buffers have different alignement requirements.
> > > Hardcode them for now as a bug fix. A proper query will be added in
> > > a subsequent patch.
> > >
> > > Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual
> > > address and size")
> > > Signed-off-by: Alex Deucher <[email protected]>
> > > ---
> > > drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 14 +++++++++-----
> > > 1 file changed, 9 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> > > b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> > > index 0370ef719a6aa..ab392de2a2388 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> > > @@ -254,7 +254,6 @@ static int mes_userq_mqd_create(struct
> > > amdgpu_userq_mgr *uq_mgr,
> > > struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
> > > struct drm_amdgpu_userq_in *mqd_user = args_in;
> > > struct amdgpu_mqd_prop *userq_props;
> > > - struct amdgpu_gfx_shadow_info shadow_info;
> > > int r;
> > >
> > > /* Structure to initialize MQD for userqueue using generic MQD
> > > init function */ @@ -280,8 +279,6 @@ static int
> > > mes_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> > > userq_props->doorbell_index = queue->doorbell_index;
> > > userq_props->fence_address = queue->fence_drv->gpu_addr;
> > >
> > > - if (adev->gfx.funcs->get_gfx_shadow_info)
> > > - adev->gfx.funcs->get_gfx_shadow_info(adev, &shadow_info,
> > > true);
> > > if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
> > > struct drm_amdgpu_userq_mqd_compute_gfx11
> > > *compute_mqd;
> > >
> > > @@ -299,7 +296,7 @@ static int mes_userq_mqd_create(struct
> > > amdgpu_userq_mgr *uq_mgr,
> > > }
> > >
> > > r = amdgpu_userq_input_va_validate(queue,
> > > compute_mqd->eop_va,
> > > - max_t(u32, PAGE_SIZE,
> > > AMDGPU_GPU_PAGE_SIZE));
> > > + 2048);
> > [Prike] How did this issue occur? If we hardcode the value here, will
> > userspace
> driver EOP buffer requests still be compatible?
> > We may need to add a TODO comment to replace this with proper EOP-size
> queries in a future change.
>
> It's just temporary to make the patch easier to backport. It's fixed
> properly later in
> this series.
>
> >
> > > if (r)
> > > goto free_mqd;
> > >
> > > @@ -312,6 +309,9 @@ static int mes_userq_mqd_create(struct
> > > amdgpu_userq_mgr *uq_mgr,
> > > kfree(compute_mqd);
> > > } else if (queue->queue_type == AMDGPU_HW_IP_GFX) {
> > > struct drm_amdgpu_userq_mqd_gfx11 *mqd_gfx_v11;
> > > + struct amdgpu_gfx_shadow_info shadow_info;
> > > +
> > > + adev->gfx.funcs->get_gfx_shadow_info(adev,
> > > + &shadow_info, true);
> > [Prike] We may need to validate the get_gfx_shadow_info() callback before
> > using
> it.
>
> It should always be implemented for user queues. If it's not, we don't have
> a way to
> know what the user queue metadata sizes are.
Yes, there's no issue for the most common case. But for the NPI early phase the
callback may miss implementing anyhow, that will result in a null pointer issue.
> Alex
>
> >
> > > if (mqd_user->mqd_size != sizeof(*mqd_gfx_v11) ||
> > > !mqd_user->mqd) {
> > > DRM_ERROR("Invalid GFX MQD\n"); @@ -335,6
> > > +335,10 @@ static int mes_userq_mqd_create(struct amdgpu_userq_mgr
> > > *uq_mgr,
> > > shadow_info.shadow_size);
> > > if (r)
> > > goto free_mqd;
> > > + r = amdgpu_userq_input_va_validate(queue,
> > > mqd_gfx_v11->csa_va,
> > > + shadow_info.csa_size);
> > > + if (r)
> > > + goto free_mqd;
> > >
> > > kfree(mqd_gfx_v11);
> > > } else if (queue->queue_type == AMDGPU_HW_IP_DMA) { @@ -353,7
> > > +357,7 @@ static int mes_userq_mqd_create(struct amdgpu_userq_mgr
> > > +*uq_mgr,
> > > goto free_mqd;
> > > }
> > > r = amdgpu_userq_input_va_validate(queue,
> > > mqd_sdma_v11-
> > > >csa_va,
> > > - shadow_info.csa_size);
> > > + 32);
> > > if (r)
> > > goto free_mqd;
> > >
> > > --
> > > 2.51.0
> >