On 24-10-2025 09:20 am, SHANMUGAM, SRINIVASAN wrote:
[Public]
-----Original Message-----
From: Chen, Xiaogang <[email protected]>
Sent: Friday, October 24, 2025 3:15 AM
To: SHANMUGAM, SRINIVASAN <[email protected]>;
Koenig, Christian <[email protected]>; Deucher, Alexander
<[email protected]>
Cc: [email protected]; Yang, Philip <[email protected]>; Khatri,
Sunil <[email protected]>
Subject: Re: [PATCH v3] drm/amdkfd: Fix use-after-free of HMM range in
svm_range_validate_and_map()
On 10/23/2025 9:34 AM, Srinivasan Shanmugam wrote:
The function svm_range_validate_and_map() was freeing `range` when
amdgpu_hmm_range_get_pages() failed. But later, the code still used
the same `range` pointer and freed it again. This could cause a
use-after-free and double-free issue.
The fix sets `range = NULL` right after it is freed and checks for
`range` before using or freeing it again.
v2: Removed duplicate !r check in the condition for clarity.
v3: In amdgpu_hmm_range_get_pages(), when hmm_range_fault() fails, we
kvfree(pfns) but leave the pointer in hmm_range->hmm_pfns still
pointing to freed memory. The caller (or amdgpu_hmm_range_free(range))
may try to free range->hmm_range.hmm_pfns again, causing a double
free, Setting hmm_range->hmm_pfns = NULL immediately after
kvfree(pfns) prevents both double free. (Philip)
what you fix is not "use-after-free", it is preventing double free, right?
In svm_range_validate_and_map(), When r == 0, it means success → range
is not NULL. When r != 0, it means failure → already made range = NULL.
So checking both (!r && range) is unnecessary because the moment r ==
0, we automatically know range exists and is safe to use. (Philip)
Fixes: c5e357c924e5 ("drm/amdgpu: update the functions to use amdgpu
version of hmm") Reported by: Dan Carpenter <[email protected]>
Cc: Philip Yang <[email protected]>
Cc: Sunil Khatri <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 ++++--
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index d6f903a2d573..90d26d820bac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -221,6 +221,7 @@ int amdgpu_hmm_range_get_pages(struct
mmu_interval_notifier *notifier,
out_free_pfns:
kvfree(pfns);
+ hmm_range->hmm_pfns = NULL;
hmm_range->hmm_pfns isnt set till a goto out_free_pfns is called, hence
not needed.
out_free_range:
if (r == -EBUSY)
r = -EAGAIN;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index f041643308ca..103acb925c2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1744,6 +1744,7 @@ static int svm_range_validate_and_map(struct
mm_struct *mm,
WRITE_ONCE(p->svms.faulting_task, NULL);
if (r) {
amdgpu_hmm_range_free(range);
+ range = NULL;
Range is a local pointer and if it has been freed it should not be used
again in same function. The error condition should handle that.
pr_debug("failed %d to get svm range pages\n", r);
}
} else {
@@ -1761,7 +1762,7 @@ static int svm_range_validate_and_map(struct
mm_struct *mm,
svm_range_lock(prange);
/* Free backing memory of hmm_range if it was initialized
- * Overrride return value to TRY AGAIN only if prior returns
+ * Override return value to TRY AGAIN only if prior returns
* were successful
*/
if (range && !amdgpu_hmm_range_valid(range) && !r) { @@ -1769,7
+1770,8 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
r = -EAGAIN;
}
/* Free the hmm range */
- amdgpu_hmm_range_free(range);
I guess we are setting up the error and that should be the criterion to
call this function. If there is already an error condition before we
should not be calling this again.
regards
Sunil khatri
+ if (range)
Can just check if(!r) here and remove "range=NULL" above? if r is not 0 range
has
been freed, if r is 0 free range here.
But there are later spots where r becomes non-zero after get_pages() succeeded
and range is valid:
svm_range_dma_map(...) can fail → sets r != 0.
!amdgpu_hmm_range_valid(range) → you set r = -EAGAIN.
!list_empty(&prange->child_list) → you set r = -EAGAIN.
In all three cases, your new tail logic if (!r) amdgpu_hmm_range_free(range);
skips freeing because r != 0 now. Since you also removed range = NULL; after
the early free, nothing else frees it → memory leak.
Regards,
Srini
Regards
Xiaogang
+ amdgpu_hmm_range_free(range);
if (!r && !list_empty(&prange->child_list)) {