From: Honglei Huang <[email protected]>
This series adds xnack-off mode support to the amdgpu SVM.
The design follows the invalidate and restore pattern in xe_userptr.
extending it from per VMA userptr scope to drm gpu svm range with the
additional requirements by amdgpu's xnack-off mode.
Like xe_userptr, the implementation is built on top of the drm_gpusvm
framework and centers around an MMU notifier driven invalidation.
This implementation refers to xe_userptr.c:
This patch series implements SVM support with the following design:
- The notifier invalidate callback moves ranges onto a
spinlock-protected invalidated list, like the __vma_userptr_invalidate
in xe_userptr.
- A restore worker iterates the invalidated list, calls
drm_gpusvm_get_pages() to re-acquire pages and GPU
mappings. the same get_pages + rebind flow used by
xe_vm_userptr_pin(). On transient failure, ranges will re-enqueue,
following xe_userptr's retry on EAGAIN pattern.
- Lifecycle follows the same init/fini/flush structure as
xe_userptr_setup/remove/destroy, with flush ensuring all pending
work completes before teardown.
Related work:
This series depends on the base amdgpu SVM series:
https://lore.kernel.org/amd-gfx/[email protected]/
Test results:
Tested on gfx943 (MI300X) and gfx1100 (W7900) with XNACK on:
- KFD test: 99% passed.
- ROCR test: all passed.
- HIP catch test: gfx943 (MI300X): 99% passed.
gfx1100 (W7900): 99% passed.
Patch overview:
Patch 1-2: Define restore types/states and integrate into core headers.
Patch 3: Invalidate callback - dispatch ranges to restore or GC list.
Patch 4: Restore worker - get_pages + rebind loop with retry.
Patch 5: GC worker - remove unmapped ranges, rebuild partial intervals.
Patch 6: Compute queue quiesce/resume helpers.
Patch 7: Attr change boundary realign helper.
Patch 8: Wire restore into SVM lifecycle and attr set path.
Honglei Huang (8):
drm/amdgpu: add xnack-off restore types header
drm/amdgpu: integrate xnack-off restore types into core headers
drm/amdgpu: implement xnack-off restore core and invalidate callback
drm/amdgpu: implement xnack-off restore worker
drm/amdgpu: implement xnack-off GC work function
drm/amdgpu: add xnack-off compute queue quiesce and resume helpers
drm/amdgpu: add xnack-off attr change boundary realign helper
drm/amdgpu: wire xnack-off restore into lifecycle and attr set
drivers/gpu/drm/amd/amdgpu/Makefile | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c | 44 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h | 3 +
drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c | 8 +
drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_userptr.c | 890 ++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_userptr.h | 68 ++
7 files changed, 1013 insertions(+), 8 deletions(-)
create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userptr.c
create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userptr.h
--
2.34.1