AMD General

Regards,
      Prike

> -----Original Message-----
> From: Khatri, Sunil <[email protected]>
> Sent: Friday, May 15, 2026 12:53 PM
> To: Liang, Prike <[email protected]>; [email protected]
> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>
> Subject: Re: [PATCH 2/2] drm/amdgpu: unmap userq for evicting user queue
>
>
> On 15-05-2026 08:00 am, Liang, Prike wrote:
> > AMD General
> >
> > Will resent a new version for updating the commit log.
> >
> > Regards,
> >        Prike
> >
> >> -----Original Message-----
> >> From: Liang, Prike <[email protected]>
> >> Sent: Thursday, May 14, 2026 8:43 PM
> >> To: [email protected]
> >> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> >> <[email protected]>; Liang, Prike <[email protected]>
> >> Subject: [PATCH 2/2] drm/amdgpu: unmap userq for evicting user queue
> >>
> >> If the driver only preempts queues, there can still be inflight
> >> waves, pending dispatch state, or resume/redispatch possibility tied
> >> to the same queue. Then the VM/TTM side may proceed to move/unmap
> >> queue related BOs during evicting the queue while shader TCP clients still 
> >> need
> to access them.
> >>
> >> So for eviction, unmap is safer because it makes the queue
> >> nonrunnable before memory backing is invalidated. Meanwhile, for a
> >> idle queue it's more sutiable for unmapping it rather preempt and
> >> unmapping also safe more processing time than preempt.
> >>
> >> Signed-off-by: Prike Liang <[email protected]>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> >> index 2e3edb6dd506..a63dfdfa4195 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> >> @@ -1048,7 +1048,7 @@ amdgpu_userq_restore_all(struct
> >> amdgpu_userq_mgr
> >> *uq_mgr)
> >>                        continue;
> >>                }
> >>
> >> -             r = amdgpu_userq_restore_helper(queue);
> >> +             r = amdgpu_userq_map_helper(queue);
> As per my understanding, we arent done for the queue and all the resources of 
> the
> should remain intact. A queue ideally should only need " mapping/add new 
> queue"
> when the queue is created and once that is done only the light process or
> suspend/resume should be good enough. We dont want to tear it down or rebuild
> again for anything transient.
>
> Restore/evict in most of the cases is a transient stage and only a tear down 
> should
> be unmapping it or during a GPU reset which reset all the hw states.
> So the way it is seems logical to me but i leave that to Christian to confirm.

For the eviction case, it will be a problem when the userq BOs migrated during 
the MES trying to
save or access the resource for preempting the queue. Also, the preempt 
operation doesn't need to
wait the queue to be idle, so here choose to wait the queue to be idle then 
it's more suitable for unmapping.

> Regards
> Sunil Khatri
> >>                if (r)
> >>                        ret = r;
> >>
> >> @@ -1285,7 +1285,7 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr
> >> *uq_mgr)
> >>
> >>        /* Try to unmap all the queues in this process ctx */
> >>        xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
> >> -             r = amdgpu_userq_preempt_helper(queue);
> >> +             r = amdgpu_userq_unmap_helper(queue);
> >>                if (r)
> >>                        ret = r;
> >>        }
> >> --
> >> 2.34.1

Reply via email to