AMD General

Regards,
      Prike

> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of Liang, 
> Prike
> Sent: Friday, May 15, 2026 3:24 PM
> To: Khatri, Sunil <[email protected]>; [email protected]
> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>
> Subject: RE: [PATCH 2/2] drm/amdgpu: unmap userq for evicting user queue
>
> AMD General
>
> Regards,
>       Prike
>
> > -----Original Message-----
> > From: Khatri, Sunil <[email protected]>
> > Sent: Friday, May 15, 2026 12:53 PM
> > To: Liang, Prike <[email protected]>; [email protected]
> > Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> > <[email protected]>
> > Subject: Re: [PATCH 2/2] drm/amdgpu: unmap userq for evicting user
> > queue
> >
> >
> > On 15-05-2026 08:00 am, Liang, Prike wrote:
> > > AMD General
> > >
> > > Will resent a new version for updating the commit log.
> > >
> > > Regards,
> > >        Prike
> > >
> > >> -----Original Message-----
> > >> From: Liang, Prike <[email protected]>
> > >> Sent: Thursday, May 14, 2026 8:43 PM
> > >> To: [email protected]
> > >> Cc: Deucher, Alexander <[email protected]>; Koenig,
> > >> Christian <[email protected]>; Liang, Prike
> > >> <[email protected]>
> > >> Subject: [PATCH 2/2] drm/amdgpu: unmap userq for evicting user
> > >> queue
> > >>
> > >> If the driver only preempts queues, there can still be inflight
> > >> waves, pending dispatch state, or resume/redispatch possibility
> > >> tied to the same queue. Then the VM/TTM side may proceed to
> > >> move/unmap queue related BOs during evicting the queue while shader
> > >> TCP clients still need
> > to access them.
> > >>
> > >> So for eviction, unmap is safer because it makes the queue
> > >> nonrunnable before memory backing is invalidated. Meanwhile, for a
> > >> idle queue it's more sutiable for unmapping it rather preempt and
> > >> unmapping also safe more processing time than preempt.
> > >>
> > >> Signed-off-by: Prike Liang <[email protected]>
> > >> ---
> > >>   drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 4 ++--
> > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > >> index 2e3edb6dd506..a63dfdfa4195 100644
> > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > >> @@ -1048,7 +1048,7 @@ amdgpu_userq_restore_all(struct
> > >> amdgpu_userq_mgr
> > >> *uq_mgr)
> > >>                        continue;
> > >>                }
> > >>
> > >> -             r = amdgpu_userq_restore_helper(queue);
> > >> +             r = amdgpu_userq_map_helper(queue);
> > As per my understanding, we arent done for the queue and all the
> > resources of the should remain intact. A queue ideally should only need "
> mapping/add new queue"
> > when the queue is created and once that is done only the light process
> > or suspend/resume should be good enough. We dont want to tear it down
> > or rebuild again for anything transient.
> >
> > Restore/evict in most of the cases is a transient stage and only a
> > tear down should be unmapping it or during a GPU reset which reset all the 
> > hw
> states.
> > So the way it is seems logical to me but i leave that to Christian to 
> > confirm.
>
> For the eviction case, it will be a problem when the userq BOs migrated 
> during the
> MES trying to save or access the resource for preempting the queue. Also, the
> preempt operation doesn't need to wait the queue to be idle, so here choose 
> to wait
> the queue to be idle then it's more suitable for unmapping.


Except for most cases the unbound wait + unmap should be fine, and this should 
avoid the SQ error, but for the long shader run the preempt should better and 
that need to draft solution for handling the userq wave context validation 
before restoring the userq.


Hi @Deucher, Alexander  Can we use userq unmap to handle the SQ error issue for 
now? And leave the userq internal object validation during restore as a TODO 
item?

> > Regards
> > Sunil Khatri
> > >>                if (r)
> > >>                        ret = r;
> > >>
> > >> @@ -1285,7 +1285,7 @@ amdgpu_userq_evict_all(struct
> > >> amdgpu_userq_mgr
> > >> *uq_mgr)
> > >>
> > >>        /* Try to unmap all the queues in this process ctx */
> > >>        xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
> > >> -             r = amdgpu_userq_preempt_helper(queue);
> > >> +             r = amdgpu_userq_unmap_helper(queue);
> > >>                if (r)
> > >>                        ret = r;
> > >>        }
> > >> --
> > >> 2.34.1

Reply via email to