On Wed, Jun 03, 2026 at 02:56:15PM +0800, Honglei Huang wrote: > From: Honglei Huang <[email protected]> > > The intent of this series is to make drm_gpusvm more flexible and > give drivers more freedom over how they assemble the MM related and device > side operations. > > This RFC implements the direction Matt suggested in [1]: > > - Move struct drm_gpusvm_pages out of struct drm_gpusvm_range. > - Embed either a struct device or a struct drm_device in struct > drm_gpusvm_pages. > - Drop struct drm_device from struct drm_gpusvm. > - Have the driver's range structure embed one or more struct > drm_gpusvm_pages in addition to struct drm_gpusvm_range. > - Refactor a few range-based helpers (drm_gpusvm_range_pages_valid, > drm_gpusvm_range_get_pages, drm_gpusvm_range_unmap_pages), or > simply drop them entirely and update drivers to use the > drm_gpusvm_pages helpers instead. >
Overall this looks good - thanks doing this. > In essence the series does only two abstractions, plus the xe > adaptation that follows from them: > > - range vs pages: split drm_gpusvm_range (MM / VA range state) from > drm_gpusvm_pages (device physical related), so the two > sides can have independent lifetimes and ownership. > - drm_gpusvm vs drm_device: make drm_gpusvm pure MM level and push > the device side down onto drm_gpusvm_pages, which is where DMA > actually happens. > - xe is updated to fit the modifications, no functional change intended. > > If such changes are acceptable in terms of direction, I have a few questions: > > - Drivers now own drm_gpusvm_pages unmap / free and notifier_seq init. > OK to push this fully to drivers, or should some new mechanisms need to > add > to ensure functions can be completed by the framework? I'm looking at the diff of xe_svm.c before / after and I see drm_gpusvm_free_pages moved to xe_svm_range_free. That looks fine to me. I see in xe_svm_range_alloc() this: range->pages.notifier_seq = LONG_MAX; Can we make help like drm_gpusvm_init_pages which does this? I think it is better to encapsulate the pages init into normalized helper even though it is very simple. Maybe an inline since this just a single line of code? > - This series drops the three drm_gpusvm_range_* helpers and changes > drm_gpusvm_get_pages() / drm_gpusvm_init() signatures. > Do we need to keep thin wrappers for backward compatibility. It should be safe to drop these helpers. > - drm_gpusvm_get_pages() mixes HMM fault and device DMA map. Multi device > under > one SVM calls would repeat the HMM fault. Does it need to modified to Split > into MM level fault + per pages DMA map? > Hmm, this might get a little tricky because of how the allocation/retry loop is implemented in get_pages(). Maybe we could change the function to accept an array of pages plus a count? I’m not sure what the best approach is here, but I’m open to ideas. That said, I’d rather avoid having the driver open-code a retry loop if it could live in common code. Side note: another modification we need in get_pages() is to make the DMA-mapping step optional. I suggested that AMDXDNA use GPU SVM for userptr, and I don’t believe that device requires DMA mapping. > Patch overview: > > 1/5 gpusvm: split MM state flags onto drm_gpusvm_range_flags. > 2/5 gpusvm: embed drm_device into drm_gpusvm_pages; DMA goes > through it. > 3/5 xe: xe_svm_range owns its drm_gpusvm_pages and its lifecycle. > 4/5 gpusvm: drop pages from drm_gpusvm_range and the range-level > wrappers. > 5/5 gpusvm: drop drm_device from drm_gpusvm. > > tests: > AMDGPU: > based on amdgpu adaptation patch in [2], but still SVM:DRM = 1:1, > 1:n is on going needs many modifications and testings. > > Tested on gfx943 (MI300X) and gfx906 (MI60) with XNACK on/off: > - KFD test: 95%+ passed. > - ROCR test: all passed. > - HIP catch test: gfx943 (MI300X): 96% passed. > gfx906 (MI60): 99% passed. > INTEL XE: > TODO: We bought some Intel Arc A380, but it seems like this cards > don't support hardware fault / SVM, waiting for the new > cards B580/B570 to arrive. > Please send patches that modify GPU SVM or Xe to the Xe mailing list. We have public CI, which I believe can be triggered by any AMD email address. I just pulled the code, encountered a compile error, and noticed a bug around unmapping related to that error. I put together some quick fixes on top of the series here [3], and locally all of our tests seem to be passing. I’ll reply in detail to the patches shortly, explaining some of the reasoning behind these changes. Matt [3] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/623f6a50c037d9e44f6c9fbe6859a0ba7ad50177 > links: > [1] > https://lore.kernel.org/amd-gfx/acRgr7QwdULsn6G2@gsse-cloud1/#:~:text=I%20think%20roughly,drm_gpusvm_pages%0A%20%20helpers%20instead. > [2] > https://lore.kernel.org/amd-gfx/[email protected]/ > Honglei Huang (5): > drm/gpusvm: split MM state flags out of drm_gpusvm_pages_flags > drm/gpusvm: embed struct drm_device into drm_gpusvm_pages > drm/xe: have xe_svm_range embed one drm_gpusvm_pages > drm/gpusvm: move struct drm_gpusvm_pages out of struct > drm_gpusvm_range > drm/gpusvm: let the drm_gpusvm core context purely MM level > > drivers/gpu/drm/drm_gpusvm.c | 128 +++++++++----------------------- > drivers/gpu/drm/xe/xe_pt.c | 2 +- > drivers/gpu/drm/xe/xe_svm.c | 37 +++++---- > drivers/gpu/drm/xe/xe_svm.h | 11 ++- > drivers/gpu/drm/xe/xe_userptr.c | 1 + > include/drm/drm_gpusvm.h | 49 ++++++------ > 6 files changed, 95 insertions(+), 133 deletions(-) > > -- > 2.34.1 >
