From: Honglei Huang <[email protected]>

This series adds xnack-off mode support to the amdgpu SVM.
The design follows the invalidate and restore pattern in xe_userptr.
extending it from per VMA userptr scope to drm gpu svm range with the
additional requirements by amdgpu's xnack-off mode.
Like xe_userptr, the implementation is built on top of the drm_gpusvm
framework and centers around an MMU notifier driven invalidation.
This implementation refers to xe_userptr.c:

This patch series implements SVM support with the following design:
  - The notifier invalidate callback moves ranges onto a
    spinlock-protected invalidated list, like the __vma_userptr_invalidate
    in xe_userptr.

  - A restore worker iterates the invalidated list, calls
    drm_gpusvm_get_pages() to re-acquire pages and GPU
    mappings. the same get_pages + rebind flow used by
    xe_vm_userptr_pin(). On transient failure, ranges will re-enqueue,
    following xe_userptr's retry on EAGAIN pattern.

  - Lifecycle follows the same init/fini/flush structure as
    xe_userptr_setup/remove/destroy, with flush ensuring all pending
    work completes before teardown.

Related work:
This series depends on the base amdgpu SVM series:
  
https://lore.kernel.org/amd-gfx/[email protected]/

Test results:
  Tested on gfx943 (MI300X) and gfx1100 (W7900) with XNACK on:
  - KFD test: 99% passed.
  - ROCR test: all passed.
  - HIP catch test: gfx943 (MI300X): 99% passed.
                    gfx1100 (W7900): 99% passed.

Patch overview:
  Patch 1-2: Define restore types/states and integrate into core headers.
  Patch 3:   Invalidate callback - dispatch ranges to restore or GC list.
  Patch 4:   Restore worker - get_pages + rebind loop with retry.
  Patch 5:   GC worker - remove unmapped ranges, rebuild partial intervals.
  Patch 6:   Compute queue quiesce/resume helpers.
  Patch 7:   Attr change boundary realign helper.
  Patch 8:   Wire restore into SVM lifecycle and attr set path.

Honglei Huang (8):
  drm/amdgpu: add xnack-off restore types header
  drm/amdgpu: integrate xnack-off restore types into core headers
  drm/amdgpu: implement xnack-off restore core and invalidate callback
  drm/amdgpu: implement xnack-off restore worker
  drm/amdgpu: implement xnack-off GC work function
  drm/amdgpu: add xnack-off compute queue quiesce and resume helpers
  drm/amdgpu: add xnack-off attr change boundary realign helper
  drm/amdgpu: wire xnack-off restore into lifecycle and attr set

 drivers/gpu/drm/amd/amdgpu/Makefile           |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c       |  44 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h       |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c |   8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userptr.c   | 890 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userptr.h   |  68 ++
 7 files changed, 1013 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userptr.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userptr.h

-- 
2.34.1

Reply via email to