Re: [V2] drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pages

Huang, Honglei1 Fri, 29 May 2026 00:34:10 -0700



On 5/29/2026 3:04 PM, Christian König wrote:

On 5/29/26 04:27, Honglei Huang wrote:

Since commit 144ba981783f ("drm/amdgpu: fix amdgpu_hmm_range_get_pages")
moved mmu_interval_read_begin() out of the per-chunk loop, the
captured notifier_seq is no longer refreshed across retries. As a
result, the existing -EBUSY retry path can never make progress:

   hmm_range_fault() returns -EBUSY only when
   mmu_interval_check_retry(notifier, notifier_seq) reports that the
   sequence is stale. Once the sequence has advanced, the stored seq
   will never match again, so every subsequent call within the same
   invocation returns -EBUSY immediately.

The "goto retry" therefore degenerates into a busy spin that simply
burns CPU for the full HMM_RANGE_DEFAULT_TIMEOUT (~1s) window before
finally bailing out with -EAGAIN. This is pure latency with no chance
of recovery, and it actively hurts the KFD userptr stack: the caller
ends up blocked for a second while holding mmap_lock, only to return
-EAGAIN to the restore worker (or to userspace) which would have
re-driven the operation immediately anyway.

Drop the retry/timeout entirely and let -EBUSY propagate straight to
out_free_pfns, where it is already translated to -EAGAIN. Recovery is
handled at a higher level: the KFD restore_userptr_worker reschedules
itself, and the userptr ioctl path returns -EAGAIN to userspace.

No functional regression: the previous behaviour on -EBUSY was already
to fail with -EAGAIN after a 1s stall; we just skip the stall.

Signed-off-by: Honglei Huang <[email protected]>


Reviewed-by: Christian König <[email protected]>


Thanks a lot for the review, will respin with your R-b added.

Regards,
Honglei

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 9 +--------
  1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 5d72878c8..229c30867 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -172,7 +172,6 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
        const u64 max_bytes = SZ_2G;

struct hmm_range *hmm_range = &range->hmm_range;

-       unsigned long timeout;
        unsigned long *pfns;
        unsigned long end;
        int r;
@@ -199,15 +198,9 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,
                pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
                        hmm_range->start, hmm_range->end);

- timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);

-
-retry:
                r = hmm_range_fault(hmm_range);
-               if (unlikely(r)) {
-                       if (r == -EBUSY && !time_after(jiffies, timeout))
-                               goto retry;
+               if (unlikely(r))
                        goto out_free_pfns;
-               }

if (hmm_range->end == end)

                        break;

Re: [V2] drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pages

Reply via email to