Thank you for your contribution! Sashiko AI review found 1 potential issue(s) 
to consider:

Pre-existing issues:
- [High] Unbounded accumulation of `xe_madvise_notifier` structs leading to 
memory leak and Denial of Service (OOM).
--

commit 01cdfed4cb38d71774f4ae69b625419325962b05
Author: Arvind Yadav <[email protected]>
drm/xe/vm: Disable madvise notifier on GPU touch

This commit adds an active flag to xe_madvise_notifier to stop queueing
autoreset work once a CPU-only VMA has been GPU-faulted. It modifies the
pagefault handler to touch the VMA and deactivate exact-match notifiers
in the maple tree.

> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c 
> b/drivers/gpu/drm/xe/xe_vm_madvise.c
> index c2abe712598a1..6c42ce8e3f525 100644
> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
[ ... ]
> @@ -1236,3 +1238,46 @@ int xe_vm_madvise_register_notifier_range(struct xe_vm 
> *vm, u64 start, u64 end)
>  
>       return err;
>  }
> +
> +/**
> + * xe_vm_deactivate_madvise_notifier_for_range - Disable callbacks for a VMA
> + * @vm: VM
> + * @start: VMA start
> + * @end: VMA end
> + */
> +static void xe_vm_deactivate_madvise_notifier_for_range(struct xe_vm *vm, 
> u64 start, u64 end)
> +{
> +     struct xe_madvise_notifier *notifier;
> +     unsigned long index = start;
> +
> +     lockdep_assert_held_write(&vm->lock);
> +
> +     if (!vm->svm.madvise_work.wq)
> +             return;
> +
> +     /*
> +      * Only exact-match notifiers are disabled. Broader notifiers may still
> +      * cover CPU-only split siblings.
> +      */
> +     mt_for_each(&vm->svm.madvise_notifiers, notifier, index, end - 1)
> +             if (notifier->vma_start == start && notifier->vma_end == end)
> +                     WRITE_ONCE(notifier->active, false);

[Severity: High]
This is a pre-existing issue, but does leaving these deactivated notifiers in
memory without freeing them contribute to an unbounded accumulation?

Looking at xe_vm_madvise_register_notifier_range(), notifiers are allocated
and linked to the VM indefinitely:

xe_vm_madvise_register_notifier_range() {
    notifier = xe_madvise_notifier_alloc(vm, start, end);
    if (!notifier)
        return -ENOMEM;
    ...
    list_add_tail(&notifier->link, &vm->svm.madvise_notifier_list);
}

Since older notifiers are only freed if they are fully covered by a new range
or during VM teardown, could userspace intentionally map, bind with AUTORESET,
madvise, and unbind at different addresses across the virtual address space?

This sequence seems like it would bypass the displacement logic and permanently
leak the notifier objects, potentially consuming kernel memory and causing a
denial of service.

Should xe_vm_deactivate_madvise_notifier_for_range() or the unbind path
actively clean up these deactivated objects?

> +}
> +
> +/**
> + * xe_vm_madvise_gpu_touch - Disable madvise notifier after GPU touch
> + * @vm: VM
> + * @vma: GPU-touched VMA
> + */
> +void xe_vm_madvise_gpu_touch(struct xe_vm *vm, struct xe_vma *vma)
> +{
> +     lockdep_assert_held_write(&vm->lock);
> +
> +     /* Only AUTORESET VMAs have madvise notifiers. */
> +     if (!(vma->gpuva.flags & XE_VMA_MADV_AUTORESET))
> +             return;
> +
> +     xe_vm_deactivate_madvise_notifier_for_range(vm,
> +                                             xe_vma_start(vma),
> +                                             xe_vma_end(vma));
> +}

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=5

Reply via email to