Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:

New issues:
- [Low] Compilation error due to missing `pages` member in `struct 
xe_svm_range`.

Pre-existing issues:
- [High] Critical error paths swallow error codes by jumping to `out`, causing 
an infinite GPU page fault livelock.
--

commit c7d50cf1df054885b8dfd4565b263894bfd104b3
Author: Honglei Huang <[email protected]>
Subject: drm/gpusvm: split MM state flags out of drm_gpusvm_pages_flags

This commit splits the MM state flags out of drm_gpusvm_pages_flags and moves
them to a new drm_gpusvm_range_flags structure. It keeps the mapping state
flags in drm_gpusvm_pages_flags and updates usage in the amd and xe drivers.

Link: 
https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/623f6a50c037d9e44f6c9fbe6859a0ba7ad50177

> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index e1651e70c8f01..05578e187d204 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -134,7 +134,8 @@ xe_svm_garbage_collector_add_range(struct xe_vm *vm, 
> struct xe_svm_range *range,
>  
>       range_debug(range, "GARBAGE COLLECTOR ADD");
>  
> -     drm_gpusvm_range_set_unmapped(&range->base, mmu_range);
> +     drm_gpusvm_range_set_unmapped(&range->base, &range->pages, 1,
> +                                   mmu_range);

[Severity: Low]
Does this cause a compilation error due to pages not being a direct member
of struct xe_svm_range? It looks like the pages struct is located inside the
embedded base field (range->base.pages).

While this might be fixed in a later patch in the series, should each commit
compile successfully on its own to preserve bisection?

[ ... ]

> @@ -1248,7 +1249,7 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, 
> struct xe_vma *vma,
>  
>       xe_svm_range_fault_count_stats_incr(gt, range);
>  
> -     if (ctx.devmem_only && !range->base.pages.flags.migrate_devmem) {
> +     if (ctx.devmem_only && !range->base.flags.migrate_devmem) {
>               err = -EACCES;
>               goto out;
>       }

[Severity: High]
This is a pre-existing issue, but can this error path swallow the error code by
jumping to the out label?

Looking at __xe_svm_handle_pagefault(), the out label appears to
unconditionally return 0:

out:
        xe_svm_range_fault_us_stats_incr(gt, range, start);
        return 0;

If we set err = -EACCES and jump to out, or if xe_svm_range_get_pages()
fails and jumps to out later in the function, does returning 0 cause the
caller to believe the page fault was handled successfully?

Since the pages weren't mapped, wouldn't the GPU immediately retry the
instruction and fault again, potentially causing an infinite livelock? Should
these error paths jump to err_out instead?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to