On 6/1/26 16:20, Chen, Xiaogang wrote:
> 
> On 5/31/2026 9:40 PM, Huang, Honglei1 wrote:
>>
>>
>> On 5/30/2026 2:35 AM, Chen, Xiaogang wrote:
>>>
>>> On 5/29/2026 2:35 AM, Honglei Huang wrote:
>>>> Since commit 144ba981783f ("drm/amdgpu: fix amdgpu_hmm_range_get_pages")
>>>> moved mmu_interval_read_begin() out of the per-chunk loop, the
>>>> captured notifier_seq is no longer refreshed across retries. As a
>>>> result, the existing -EBUSY retry path can never make progress:
>>> "retry" should come with mmu_interval_read_begin. The commit 144ba981783f 
>>> move mmu_interval_read_begin out of loop, then "retry" should also be moved 
>>> out loop with mmu_interval_read_begin.
>>>>    hmm_range_fault() returns -EBUSY only when
>>>>    mmu_interval_check_retry(notifier, notifier_seq) reports that the
>>>>    sequence is stale. Once the sequence has advanced, the stored seq
>>>>    will never match again, so every subsequent call within the same
>>>>    invocation returns -EBUSY immediately.
>>>>
>>>> The "goto retry" therefore degenerates into a busy spin that simply
>>>> burns CPU for the full HMM_RANGE_DEFAULT_TIMEOUT (~1s) window before
>>>> finally bailing out with -EAGAIN. This is pure latency with no chance
>>>> of recovery, and it actively hurts the KFD userptr stack: the caller
>>>> ends up blocked for a second while holding mmap_lock, only to return
>>>> -EAGAIN to the restore worker (or to userspace) which would have
>>>> re-driven the operation immediately anyway.
>>>>
>>>> Drop the retry/timeout entirely and let -EBUSY propagate straight to
>>>> out_free_pfns, where it is already translated to -EAGAIN. Recovery is
>>>> handled at a higher level: the KFD restore_userptr_worker reschedules
>>>> itself, and the userptr ioctl path returns -EAGAIN to userspace.
>>>>
>>>> No functional regression: the previous behaviour on -EBUSY was already
>>>> to fail with -EAGAIN after a 1s stall; we just skip the stall.
>>>>
>>>> Reviewed-by: Christian König<[email protected]>
>>>> Signed-off-by: Honglei Huang<[email protected]>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 9 +--------
>>>>   1 file changed, 1 insertion(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>>> index 5d72878c8..229c30867 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>>> @@ -172,7 +172,6 @@ int amdgpu_hmm_range_get_pages(struct 
>>>> mmu_interval_notifier *notifier,
>>>>       const u64 max_bytes = SZ_2G;
>>>>         struct hmm_range *hmm_range = &range->hmm_range;
>>>> -    unsigned long timeout;
>>>>       unsigned long *pfns;
>>>>       unsigned long end;
>>>>       int r;
>>>> @@ -199,15 +198,9 @@ int amdgpu_hmm_range_get_pages(struct 
>>>> mmu_interval_notifier *notifier,
>>>>           pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
>>>>               hmm_range->start, hmm_range->end);
>>>>   -        timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
>>>> -
>>>> -retry:
>>> If you remove "retry" here the callers including user space need to redo 
>>> the thing. This function's work is memory validation. It should do that in 
>>> its best before return to caller.
>>>
>>> If user space is the caller there would be a lot of user space from/to 
>>> kernel space context switches. That will make the procedure even slower.
>>>
>>> I think we need keep "retry" inside this function, but move "retry" out of 
>>> loop at mmu_interval_read_begin.
>>
>> This option has been discussed in 
>> previous:https://lore.kernel.org/amd-gfx/[email protected]/#:~:text=%3E%3E%20What%20probably%20needs%20to%20happen%20is%20that%20we%20need%20to%20move%20the%20retry%20label%20or%20just%20completely%20stop%20retrying%20at%20all.
>>
>> Why drop the retry here is because to keep the behavior unchanged, the 
>> current code logic will always return -EBUSY when goto retry path (pages 
>> change when get pages).
> It is a regression introduced by  commit 144ba981783f. Why not do "the retry 
> need to be moved to the start of do while loop, the entire fault path need to 
> be done again"? It is same as I said here.

Retrying here is fundamentally wrong approach. Ideally a page fault retry 
should go all the way to userspace again so that we can eventually even handle 
signals and spend a couple of extra cycles doing something else than trying to 
hammer on the page tables.

>>
>> And there is a scenario of extreme conditions where a huge system buffer 
>> attempts to use this function, causing the function to retry indefinitely, 
>> for example, with a space of 60G, it will always gets retry again and again.
> 
> Having callers decide to do "retry" does not fix this problem. The callers 
> have no idea what is going on inside low level code. This 
> function(amdgpu_hmm_range_get_pages) knows more than callers, then better to 
> let it make "retry" decision, ex: if amdgpu_hmm_range_get_pages has retried 
> more than preset number of times or lasted more than preset time it can abort 
> and return error.
> 
> If user space is the caller there will be a lot of user/kernel space context 
> switches in the scenario you mentioned. That people want to avoid.

No, that overhead is wanted. It is actually intentional to go all the way back 
to userspace and re-try.

Regards,
Christian.

> 
> Removing "retry" from amdgpu_hmm_range_get_pages does not help since callers 
> have to decide whether do "retry", but callers have no idea what happened, 
> also it has performance penalty.
> 
> Regards
> 
> Xiaogang
> 
>>
>> Maybe it is about the overall architecture, the set of hmm function in 
>> amdgpu needs to be size limited or it needs to be cut into smaller pieces if 
>> the interval specified by the user is too large.
>>
>> Regards,
>> Honglei
>>
>>
>>>
>>> Regards
>>> Xiaogang
>>>
>>>>           r = hmm_range_fault(hmm_range);
>>>> -        if (unlikely(r)) {
>>>> -            if (r == -EBUSY && !time_after(jiffies, timeout))
>>>> -                goto retry;
>>>> +        if (unlikely(r))
>>>>               goto out_free_pfns;
>>>> -        }
>>>>             if (hmm_range->end == end)
>>>>               break;
>>

Reply via email to