Public

Regards,
      Prike

> -----Original Message-----
> From: Deucher, Alexander <[email protected]>
> Sent: Thursday, May 21, 2026 6:01 AM
> To: Koenig, Christian <[email protected]>; Liang, Prike
> <[email protected]>; [email protected]
> Subject: RE: [PATCH 2/2] drm/amdgpu: unmap userq for evicting user queue
>
> Public
>
> > -----Original Message-----
> > From: Koenig, Christian <[email protected]>
> > Sent: Wednesday, May 20, 2026 7:26 AM
> > To: Liang, Prike <[email protected]>; [email protected]
> > Cc: Deucher, Alexander <[email protected]>
> > Subject: Re: [PATCH 2/2] drm/amdgpu: unmap userq for evicting user
> > queue
> >
> > On 5/14/26 14:42, Prike Liang wrote:
> > > If the driver only preempts queues, there can still be inflight
> > > waves, pending dispatch state, or resume/redispatch possibility tied
> > > to the same queue. Then the VM/TTM side may proceed to move/unmap
> > > queue related BOs during evicting the queue while shader TCP clients
> > > still need to access them.
> > >
> > > So for eviction, unmap is safer because it makes the queue
> > > nonrunnable before memory backing is invalidated. Meanwhile, for a
> > > idle queue it's more sutiable for unmapping it rather preempt and
> > > unmapping also safe more processing time than preempt.
> > >
> > > Signed-off-by: Prike Liang <[email protected]>
> >
> > I was already wondering when we switched to preempt if that is correct
> > or not.
> >
> > This patch here pretty much confirms that it was not correct. I need
> > to discuss with Alex what to do here, but for now the patch is clearly a 
> > bug fix:
> >
> > Reviewed-by: Christian König <[email protected]>
>
> What's the point of having a preempt operation if it doesn't actually preempt 
> the
> work?  Seems like a firmware bug.
Maybe the MES executes the preempt operation correctly, but the driver may need 
to further handle the following things.
1) remove the unbound waiting for the queue preempt.
2) need to validate the user queue internal BOs like as CSA/WPTR/EOP etc as 
well before restoring the preempted queue
3) use the unmap the queue at the first place, but for the long shader context 
better to fallback to the preempt operation for simplifying save/restore 
latency and avoiding the unbound wait timeout.

>
> Reviewed-by: Alex Deucher <[email protected]>
>
> Alex
>
>
> >
> > Thanks,
> > Christian.
> >
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > > index 2e3edb6dd506..a63dfdfa4195 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > > @@ -1048,7 +1048,7 @@ amdgpu_userq_restore_all(struct
> > amdgpu_userq_mgr *uq_mgr)
> > >                     continue;
> > >             }
> > >
> > > -           r = amdgpu_userq_restore_helper(queue);
> > > +           r = amdgpu_userq_map_helper(queue);
> > >             if (r)
> > >                     ret = r;
> > >
> > > @@ -1285,7 +1285,7 @@ amdgpu_userq_evict_all(struct
> > amdgpu_userq_mgr
> > > *uq_mgr)
> > >
> > >     /* Try to unmap all the queues in this process ctx */
> > >     xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
> > > -           r = amdgpu_userq_preempt_helper(queue);
> > > +           r = amdgpu_userq_unmap_helper(queue);
> > >             if (r)
> > >                     ret = r;
> > >     }
>

Reply via email to