On 6/1/26 16:20, Chen, Xiaogang wrote:
>
> On 5/31/2026 9:40 PM, Huang, Honglei1 wrote:
>>
>>
>> On 5/30/2026 2:35 AM, Chen, Xiaogang wrote:
>>>
>>> On 5/29/2026 2:35 AM, Honglei Huang wrote:
>>>> Since commit 144ba981783f ("drm/amdgpu: fix amdgpu_hmm_range_get_pages")
>>>> moved mmu_interval_read_begin() out of the per-chunk loop, the
>>>> captured notifier_seq is no longer refreshed across retries. As a
>>>> result, the existing -EBUSY retry path can never make progress:
>>> "retry" should come with mmu_interval_read_begin. The commit 144ba981783f
>>> move mmu_interval_read_begin out of loop, then "retry" should also be moved
>>> out loop with mmu_interval_read_begin.
>>>> hmm_range_fault() returns -EBUSY only when
>>>> mmu_interval_check_retry(notifier, notifier_seq) reports that the
>>>> sequence is stale. Once the sequence has advanced, the stored seq
>>>> will never match again, so every subsequent call within the same
>>>> invocation returns -EBUSY immediately.
>>>>
>>>> The "goto retry" therefore degenerates into a busy spin that simply
>>>> burns CPU for the full HMM_RANGE_DEFAULT_TIMEOUT (~1s) window before
>>>> finally bailing out with -EAGAIN. This is pure latency with no chance
>>>> of recovery, and it actively hurts the KFD userptr stack: the caller
>>>> ends up blocked for a second while holding mmap_lock, only to return
>>>> -EAGAIN to the restore worker (or to userspace) which would have
>>>> re-driven the operation immediately anyway.
>>>>
>>>> Drop the retry/timeout entirely and let -EBUSY propagate straight to
>>>> out_free_pfns, where it is already translated to -EAGAIN. Recovery is
>>>> handled at a higher level: the KFD restore_userptr_worker reschedules
>>>> itself, and the userptr ioctl path returns -EAGAIN to userspace.
>>>>
>>>> No functional regression: the previous behaviour on -EBUSY was already
>>>> to fail with -EAGAIN after a 1s stall; we just skip the stall.
>>>>
>>>> Reviewed-by: Christian König<[email protected]>
>>>> Signed-off-by: Honglei Huang<[email protected]>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 9 +--------
>>>> 1 file changed, 1 insertion(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>>> index 5d72878c8..229c30867 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>>> @@ -172,7 +172,6 @@ int amdgpu_hmm_range_get_pages(struct
>>>> mmu_interval_notifier *notifier,
>>>> const u64 max_bytes = SZ_2G;
>>>> struct hmm_range *hmm_range = &range->hmm_range;
>>>> - unsigned long timeout;
>>>> unsigned long *pfns;
>>>> unsigned long end;
>>>> int r;
>>>> @@ -199,15 +198,9 @@ int amdgpu_hmm_range_get_pages(struct
>>>> mmu_interval_notifier *notifier,
>>>> pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
>>>> hmm_range->start, hmm_range->end);
>>>> - timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
>>>> -
>>>> -retry:
>>> If you remove "retry" here the callers including user space need to redo
>>> the thing. This function's work is memory validation. It should do that in
>>> its best before return to caller.
>>>
>>> If user space is the caller there would be a lot of user space from/to
>>> kernel space context switches. That will make the procedure even slower.
>>>
>>> I think we need keep "retry" inside this function, but move "retry" out of
>>> loop at mmu_interval_read_begin.
>>
>> This option has been discussed in
>> previous:https://lore.kernel.org/amd-gfx/[email protected]/#:~:text=%3E%3E%20What%20probably%20needs%20to%20happen%20is%20that%20we%20need%20to%20move%20the%20retry%20label%20or%20just%20completely%20stop%20retrying%20at%20all.
>>
>> Why drop the retry here is because to keep the behavior unchanged, the
>> current code logic will always return -EBUSY when goto retry path (pages
>> change when get pages).
> It is a regression introduced by commit 144ba981783f. Why not do "the retry
> need to be moved to the start of do while loop, the entire fault path need to
> be done again"? It is same as I said here.
Retrying here is fundamentally wrong approach. Ideally a page fault retry
should go all the way to userspace again so that we can eventually even handle
signals and spend a couple of extra cycles doing something else than trying to
hammer on the page tables.
>>
>> And there is a scenario of extreme conditions where a huge system buffer
>> attempts to use this function, causing the function to retry indefinitely,
>> for example, with a space of 60G, it will always gets retry again and again.
>
> Having callers decide to do "retry" does not fix this problem. The callers
> have no idea what is going on inside low level code. This
> function(amdgpu_hmm_range_get_pages) knows more than callers, then better to
> let it make "retry" decision, ex: if amdgpu_hmm_range_get_pages has retried
> more than preset number of times or lasted more than preset time it can abort
> and return error.
>
> If user space is the caller there will be a lot of user/kernel space context
> switches in the scenario you mentioned. That people want to avoid.
No, that overhead is wanted. It is actually intentional to go all the way back
to userspace and re-try.
Regards,
Christian.
>
> Removing "retry" from amdgpu_hmm_range_get_pages does not help since callers
> have to decide whether do "retry", but callers have no idea what happened,
> also it has performance penalty.
>
> Regards
>
> Xiaogang
>
>>
>> Maybe it is about the overall architecture, the set of hmm function in
>> amdgpu needs to be size limited or it needs to be cut into smaller pieces if
>> the interval specified by the user is too large.
>>
>> Regards,
>> Honglei
>>
>>
>>>
>>> Regards
>>> Xiaogang
>>>
>>>> r = hmm_range_fault(hmm_range);
>>>> - if (unlikely(r)) {
>>>> - if (r == -EBUSY && !time_after(jiffies, timeout))
>>>> - goto retry;
>>>> + if (unlikely(r))
>>>> goto out_free_pfns;
>>>> - }
>>>> if (hmm_range->end == end)
>>>> break;
>>