amdkfd: block THP for non-replayable SVM ranges

sashiko-bot Thu, 25 Jun 2026 04:11:44 -0700

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) 
to consider:


New issues:
- [High] Error handling in `svm_range_set_attr` aborts the transaction 
mid-flight without rollback, leaving overlapping ranges in the SVM interval 
tree and leaking memory.
- [High] `svm_range_add_notifier_locked` silently ignores failures when 
registering the MMU interval notifier, leading to WARN_ON_ONCE and interval 
tree corruption.
- [High] Clearing the `KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED` attribute 
permanently leaks a THP block node and permanently disables THP for the virtual 
address range.
- [High] Dynamically disabling XNACK via ioctl does not update SVM range 
notifier flags, leaving THP unblocked and risking GPU page faults.

Pre-existing issues:
- [High] If a user maps SVM memory at virtual address 0, its MMU interval 
notifier is never removed, causing a Use-After-Free.
--

commit f4df270ac94ba46190c602a81dedff1aa60cc86a
Author: Yitao Jiang <[email protected]>

drm/amdkfd: block THP for non-replayable SVM ranges

KFD SVM ranges on processes without XNACK, and ranges requested as
GPU_ALWAYS_MAPPED, cannot rely on replayable GPU faults after a CPU THP
remap of the registered VA range. Keep those ranges backed by base
pages while their interval notifier is active.

Opt those SVM interval notifiers into MMU_INTERVAL_NOTIFIER_BLOCK_THP
and update the flag when SVM attributes change. XNACK-enabled ranges
that can handle remaps through replayable faults remain eligible for
THP unless GPU_ALWAYS_MAPPED is requested.

> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 3841943da5ece..0d0feba7b7852 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

[ ... ]

> @@ -81,6 +82,26 @@ static const struct mmu_interval_notifier_ops 
> svm_range_mn_ops = {
>       .invalidate = svm_range_cpu_invalidate_pagetables,
>  };
>  
> +static unsigned int
> +svm_range_mn_flags(struct svm_range *prange)
> +{
> +     struct kfd_process *p = container_of(prange->svms, struct kfd_process,
> +                                          svms);
> +
> +     if (!p->xnack_enabled ||
> +         (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED))
> +             return MMU_INTERVAL_NOTIFIER_BLOCK_THP;
> +
> +     return 0;
> +}

[Severity: High]
When a process dynamically disables XNACK using the KFD_IOC_SMI_EVENTS ioctl,
does this leave THP unblocked on existing ranges?

In svm_range_switch_xnack_reserve_mem(), p->xnack_enabled is updated
dynamically:

        p->xnack_enabled = xnack_enabled;

However, it does not seem to update the interval notifier flags for existing
SVM ranges. Because the existing ranges remain with THP unblocked, could
khugepaged collapse THPs and generate unrecoverable GPU page faults since
XNACK is disabled?

> +
> +static int
> +svm_range_update_mn_flags_locked(struct svm_range *prange)
> +{
> +     return mmu_interval_notifier_set_flags_locked(&prange->notifier,
> +                                                   
> svm_range_mn_flags(prange));
> +}

[Severity: High]
If userspace clears the KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag or XNACK
is enabled, does this permanently leak a THP block node?

When svm_range_update_mn_flags_locked() calls
mmu_interval_notifier_set_flags_locked() with flags = 0, the
interval_sub->flags are zeroed out locally. However, since there is no
teardown path to call mmu_interval_notifier_unblock_thp_locked(), the THP
block node remains.

When the SVM range is eventually freed, mmu_interval_notifier_remove() will
also skip the unblock path because the flag was cleared, resulting in a
permanent memory leak and permanently disabling THP for the virtual address
range.

[ ... ]

> @@ -112,10 +133,11 @@ svm_range_add_notifier_locked(struct mm_struct *mm, 
> struct svm_range *prange)
>       pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", prange->svms,
>                prange, prange->start, prange->last);
>  
> -     mmu_interval_notifier_insert_locked(&prange->notifier, mm,
> -                                  prange->start << PAGE_SHIFT,
> -                                  prange->npages << PAGE_SHIFT,
> -                                  &svm_range_mn_ops);
> +     mmu_interval_notifier_insert_locked_flags(&prange->notifier, mm,
> +                                               prange->start << PAGE_SHIFT,
> +                                               prange->npages << PAGE_SHIFT,
> +                                               &svm_range_mn_ops,
> +                                               svm_range_mn_flags(prange));
>  }

[Severity: High]
Does this silently ignore failures during interval notifier registration?

If mmu_interval_notifier_insert_locked_flags() fails, it returns an error
code and prange->notifier.mm remains NULL. Because the return value is
discarded here, subsequent code in svm_range_set_attr() will call
svm_range_update_mn_flags_locked() on the same transaction. This passes the
uninitialized notifier to mmu_interval_notifier_set_flags_locked(), which
will hit a WARN_ON_ONCE(!mm), return -EINVAL, and cause the transaction to
abort, potentially corrupting the interval tree.

[ ... ]

> @@ -3763,6 +3785,12 @@ svm_range_set_attr(struct kfd_process *p, struct 
> mm_struct *mm,
>       }
>       list_for_each_entry(prange, &update_list, update_list) {
>               svm_range_apply_attrs(p, prange, nattr, attrs, &update_mapping);
> +             r = svm_range_update_mn_flags_locked(prange);
> +             if (r) {
> +                     mutex_unlock(&svms->lock);
> +                     mmap_write_unlock(mm);
> +                     goto out;
> +             }

[Severity: High]
Does aborting the transaction here leak memory and corrupt the interval tree?

If svm_range_update_mn_flags_locked() fails, the goto out path completely
bypasses the remove_list processing located further down in the function.
This leaves old cloned ranges permanently in the interval tree alongside
their newly inserted replacements that were added during the insert_list
loop. Additionally, because remove_list is a stack variable, skipping its
teardown leaks the ranges and leaves dangling pointers in the update_list
links.

>               /* TODO: unmap ranges from GPU that lost access */
>       }
>       update_mapping |= !p->xnack_enabled && !list_empty(&remap_list);

[Severity: High]
This is a pre-existing issue, but does svm_range_remove_notifier() fail to
remove the notifier if the SVM range starts at virtual address 0?

In drivers/gpu/drm/amd/amdkfd/kfd_svm.c:svm_range_remove_notifier():

        if (prange->notifier.interval_tree.start != 0 &&
            prange->notifier.interval_tree.last != 0)
                mmu_interval_notifier_remove(&prange->notifier);

If an SVM range is mapped starting at virtual address 0 (which is permitted
if vm.mmap_min_addr is 0 or if the user has CAP_SYS_RAWIO), the check
evaluates to false and incorrectly skips mmu_interval_notifier_remove().
The prange is subsequently freed, but its embedded notifier structure
remains linked in the mm->notifier_subscriptions->itree, which will cause a
use-after-free the next time the tree is modified.

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/sy1pr01mb10596eb75463208a8e1ebba0fc0...@sy1pr01mb10596.ausprd01.prod.outlook.com?part=3

Re: [PATCH 3/3] drm/amdkfd: block THP for non-replayable SVM ranges

Reply via email to