[Public]

> -----Original Message-----
> From: Khatri, Sunil <[email protected]>
> Sent: Friday, October 24, 2025 10:10 AM
> To: SHANMUGAM, SRINIVASAN <[email protected]>;
> Chen, Xiaogang <[email protected]>; Koenig, Christian
> <[email protected]>; Deucher, Alexander
> <[email protected]>
> Cc: [email protected]; Yang, Philip <[email protected]>; Khatri,
> Sunil <[email protected]>
> Subject: Re: [PATCH v3] drm/amdkfd: Fix use-after-free of HMM range in
> svm_range_validate_and_map()
>
>
> On 24-10-2025 09:20 am, SHANMUGAM, SRINIVASAN wrote:
> > [Public]
> >
> >> -----Original Message-----
> >> From: Chen, Xiaogang <[email protected]>
> >> Sent: Friday, October 24, 2025 3:15 AM
> >> To: SHANMUGAM, SRINIVASAN <[email protected]>;
> Koenig,
> >> Christian <[email protected]>; Deucher, Alexander
> >> <[email protected]>
> >> Cc: [email protected]; Yang, Philip
> >> <[email protected]>; Khatri, Sunil <[email protected]>
> >> Subject: Re: [PATCH v3] drm/amdkfd: Fix use-after-free of HMM range
> >> in
> >> svm_range_validate_and_map()
> >>
> >>
> >> On 10/23/2025 9:34 AM, Srinivasan Shanmugam wrote:
> >>> The function svm_range_validate_and_map() was freeing `range` when
> >>> amdgpu_hmm_range_get_pages() failed. But later, the code still used
> >>> the same `range` pointer and freed it again. This could cause a
> >>> use-after-free and double-free issue.
> >>>
> >>> The fix sets `range = NULL` right after it is freed and checks for
> >>> `range` before using or freeing it again.
> >>>
> >>> v2: Removed duplicate !r check in the condition for clarity.
> >>>
> >>> v3: In amdgpu_hmm_range_get_pages(), when hmm_range_fault() fails,
> >>> we
> >>> kvfree(pfns) but leave the pointer in hmm_range->hmm_pfns still
> >>> pointing to freed memory. The caller (or
> >>> amdgpu_hmm_range_free(range)) may try to free
> >>> range->hmm_range.hmm_pfns again, causing a double free, Setting
> >>> hmm_range->hmm_pfns = NULL immediately after
> >>> kvfree(pfns) prevents both double free. (Philip)
> >> what you fix is not "use-after-free", it is preventing double free, right?
> >>> In svm_range_validate_and_map(), When r == 0, it means success →
> >>> range is not NULL.  When r != 0, it means failure → already made range =
> NULL.
> >>> So checking both (!r && range) is unnecessary because the moment r
> >>> == 0, we automatically know range exists and is safe to use.
> >>> (Philip)
> >>>
> >>> Fixes: c5e357c924e5 ("drm/amdgpu: update the functions to use amdgpu
> >>> version of hmm") Reported by: Dan Carpenter
> >>> <[email protected]>
> >>> Cc: Philip Yang <[email protected]>
> >>> Cc: Sunil Khatri <[email protected]>
> >>> Cc: Christian König <[email protected]>
> >>> Cc: Alex Deucher <[email protected]>
> >>> Signed-off-by: Srinivasan Shanmugam <[email protected]>
> >>> ---
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 1 +
> >>>    drivers/gpu/drm/amd/amdkfd/kfd_svm.c    | 6 ++++--
> >>>    2 files changed, 5 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
> >>> index d6f903a2d573..90d26d820bac 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
> >>> @@ -221,6 +221,7 @@ int amdgpu_hmm_range_get_pages(struct
> >>> mmu_interval_notifier *notifier,
> >>>
> >>>    out_free_pfns:
> >>>      kvfree(pfns);
> >>> +   hmm_range->hmm_pfns = NULL;
> hmm_range->hmm_pfns isnt set till a goto out_free_pfns is called, hence not
> needed.

Why?

pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
hmm_range->hmm_pfns = pfns;

for example, hmm_range_fault() fails), the code goes to the error path:

out_free_pfns:
    kvfree(pfns);   // free the buffer

But after freeing, the pointer hmm_range->hmm_pfns is still pointing to the 
same (now freed) memory.
It’s a “dangling pointer” — it points to memory that no longer belongs to us.

Best,
Srini

> >>>    out_free_range:
> >>>      if (r == -EBUSY)
> >>>              r = -EAGAIN;
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> >>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> >>> index f041643308ca..103acb925c2b 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> >>> @@ -1744,6 +1744,7 @@ static int svm_range_validate_and_map(struct
> >> mm_struct *mm,
> >>>                      WRITE_ONCE(p->svms.faulting_task, NULL);
> >>>                      if (r) {
> >>>                              amdgpu_hmm_range_free(range);
> >>> +                           range = NULL;
> Range is a local pointer and if it has been freed it should not be used again 
> in same
> function. The error condition should handle that.
> >>>                              pr_debug("failed %d to get svm range 
> >>> pages\n", r);
> >>>                      }
> >>>              } else {
> >>> @@ -1761,7 +1762,7 @@ static int svm_range_validate_and_map(struct
> >> mm_struct *mm,
> >>>              svm_range_lock(prange);
> >>>
> >>>              /* Free backing memory of hmm_range if it was initialized
> >>> -            * Overrride return value to TRY AGAIN only if prior returns
> >>> +            * Override return value to TRY AGAIN only if prior
> >>> + returns
> >>>               * were successful
> >>>               */
> >>>              if (range && !amdgpu_hmm_range_valid(range) && !r) { @@
> >>> -1769,7
> >>> +1770,8 @@ static int svm_range_validate_and_map(struct mm_struct
> >>> +*mm,
> >>>                      r = -EAGAIN;
> >>>              }
> >>>              /* Free the hmm range */
> >>> -           amdgpu_hmm_range_free(range);
>
> I guess we are setting up the error and that should be the criterion to call 
> this
> function. If there is already an error condition before we should not be 
> calling this
> again.
>
> regards
> Sunil khatri
>
> >>> +           if (range)
> >> Can just check if(!r) here and remove "range=NULL" above? if r is not
> >> 0 range has been freed, if r is 0 free range here.
> >
> > But there are later spots where r becomes non-zero after get_pages() 
> > succeeded
> and range is valid:
> > svm_range_dma_map(...) can fail → sets r != 0.
> > !amdgpu_hmm_range_valid(range) → you set r = -EAGAIN.
> > !list_empty(&prange->child_list) → you set r = -EAGAIN.
> > In all three cases, your new tail logic if (!r) 
> > amdgpu_hmm_range_free(range);
> skips freeing because r != 0 now. Since you also removed range = NULL; after 
> the
> early free, nothing else frees it → memory leak.
> >
> > Regards,
> > Srini
> >
> >> Regards
> >>
> >> Xiaogang
> >>
> >>> +                   amdgpu_hmm_range_free(range);
> >>>
> >>>
> >>>              if (!r && !list_empty(&prange->child_list)) {

Reply via email to