From: Honglei Huang <[email protected]>
The intent of this series is to make drm_gpusvm more flexible and
give drivers more freedom over how they assemble the MM related and device
side operations.
This RFC implements the direction Matt suggested in [1]:
- Move struct drm_gpusvm_pages out of struct drm_gpusvm_range.
- Embed either a struct device or a struct drm_device in struct
drm_gpusvm_pages.
- Drop struct drm_device from struct drm_gpusvm.
- Have the driver's range structure embed one or more struct
drm_gpusvm_pages in addition to struct drm_gpusvm_range.
- Refactor a few range-based helpers (drm_gpusvm_range_pages_valid,
drm_gpusvm_range_get_pages, drm_gpusvm_range_unmap_pages), or
simply drop them entirely and update drivers to use the
drm_gpusvm_pages helpers instead.
In essence the series does only two abstractions, plus the xe
adaptation that follows from them:
- range vs pages: split drm_gpusvm_range (MM / VA range state) from
drm_gpusvm_pages (device physical related), so the two
sides can have independent lifetimes and ownership.
- drm_gpusvm vs drm_device: make drm_gpusvm pure MM level and push
the device side down onto drm_gpusvm_pages, which is where DMA
actually happens.
- xe is updated to fit the modifications, no functional change intended.
If such changes are acceptable in terms of direction, I have a few questions:
- Drivers now own drm_gpusvm_pages unmap / free and notifier_seq init.
OK to push this fully to drivers, or should some new mechanisms need to add
to ensure functions can be completed by the framework?
- This series drops the three drm_gpusvm_range_* helpers and changes
drm_gpusvm_get_pages() / drm_gpusvm_init() signatures.
Do we need to keep thin wrappers for backward compatibility.
- drm_gpusvm_get_pages() mixes HMM fault and device DMA map. Multi device
under
one SVM calls would repeat the HMM fault. Does it need to modified to Split
into MM level fault + per pages DMA map?
Patch overview:
1/5 gpusvm: split MM state flags onto drm_gpusvm_range_flags.
2/5 gpusvm: embed drm_device into drm_gpusvm_pages; DMA goes
through it.
3/5 xe: xe_svm_range owns its drm_gpusvm_pages and its lifecycle.
4/5 gpusvm: drop pages from drm_gpusvm_range and the range-level
wrappers.
5/5 gpusvm: drop drm_device from drm_gpusvm.
tests:
AMDGPU:
based on amdgpu adaptation patch in [2], but still SVM:DRM = 1:1,
1:n is on going needs many modifications and testings.
Tested on gfx943 (MI300X) and gfx906 (MI60) with XNACK on/off:
- KFD test: 95%+ passed.
- ROCR test: all passed.
- HIP catch test: gfx943 (MI300X): 96% passed.
gfx906 (MI60): 99% passed.
INTEL XE:
TODO: We bought some Intel Arc A380, but it seems like this cards
don't support hardware fault / SVM, waiting for the new
cards B580/B570 to arrive.
links:
[1]
https://lore.kernel.org/amd-gfx/acRgr7QwdULsn6G2@gsse-cloud1/#:~:text=I%20think%20roughly,drm_gpusvm_pages%0A%20%20helpers%20instead.
[2]
https://lore.kernel.org/amd-gfx/[email protected]/
Honglei Huang (5):
drm/gpusvm: split MM state flags out of drm_gpusvm_pages_flags
drm/gpusvm: embed struct drm_device into drm_gpusvm_pages
drm/xe: have xe_svm_range embed one drm_gpusvm_pages
drm/gpusvm: move struct drm_gpusvm_pages out of struct
drm_gpusvm_range
drm/gpusvm: let the drm_gpusvm core context purely MM level
drivers/gpu/drm/drm_gpusvm.c | 128 +++++++++-----------------------
drivers/gpu/drm/xe/xe_pt.c | 2 +-
drivers/gpu/drm/xe/xe_svm.c | 37 +++++----
drivers/gpu/drm/xe/xe_svm.h | 11 ++-
drivers/gpu/drm/xe/xe_userptr.c | 1 +
include/drm/drm_gpusvm.h | 49 ++++++------
6 files changed, 95 insertions(+), 133 deletions(-)
--
2.34.1