On Tue, Jun 17, 2025 at 09:37:28PM +0200, Thomas Hellström wrote: > On Tue, 2025-06-17 at 10:04 -0700, Matthew Brost wrote: > > On Tue, Jun 17, 2025 at 04:55:26PM +0200, Thomas Hellström wrote: > > > On Tue, 2025-06-17 at 20:17 +0530, Ghimiray, Himal Prasad wrote: > > > > > > > > > > > > On 17-06-2025 18:41, Thomas Hellström wrote: > > > > > On Tue, 2025-06-17 at 18:25 +0530, Ghimiray, Himal Prasad > > > > > wrote: > > > > > > > > > > > > > > > > > > On 13-06-2025 19:32, Thomas Hellström wrote: > > > > > > > From: Matthew Brost <[email protected]> > > > > > > > > > > > > > > The migration functionality and track-keeping of per- > > > > > > > pagemap > > > > > > > VRAM > > > > > > > mapped to the CPU mm is not per GPU_vm, but rather per > > > > > > > pagemap. > > > > > > > This is also reflected by the functions not needing the > > > > > > > drm_gpusvm > > > > > > > structures. So move to drm_pagemap. > > > > > > > > > > > > > > With this, drm_gpusvm shouldn't really access the page > > > > > > > zone- > > > > > > > device- > > > > > > > data > > > > > > > since its meaning is internal to drm_pagemap. Currently > > > > > > > it's > > > > > > > used > > > > > > > to > > > > > > > reject mapping ranges backed by multiple drm_pagemap > > > > > > > allocations. > > > > > > > For now, make the zone-device-data a void pointer. > > > > > > > > > > > > > > Alter the interface of drm_gpusvm_migrate_to_devmem() to > > > > > > > ensure > > > > > > > we > > > > > > > don't > > > > > > > pass a gpusvm pointer. > > > > > > > > > > > > > > Rename CONFIG_DRM_XE_DEVMEM_MIRROR to > > > > > > > CONFIG_DRM_XE_PAGEMAP. > > > > > > > > > > > > > > Matt is listed as author of this commit since he wrote most > > > > > > > of > > > > > > > the > > > > > > > code, > > > > > > > and it makes sense to retain his git authorship. > > > > > > > Thomas mostly moved the code around. > > > > > > > > > > > > > > > > > > > > v3: > > > > > > > - Kerneldoc fixes (CI) > > > > > > > - Don't update documentation about how the drm_pagemap > > > > > > > migration should be interpreted until upcoming > > > > > > > patches where the functionality is implemented. > > > > > > > (Matt Brost) > > > > > > > > > > > > > > Co-developed-by: Thomas Hellström > > > > > > > <[email protected]> > > > > > > > Signed-off-by: Thomas Hellström > > > > > > > <[email protected]> > > > > > > > --- > > > > > > > Documentation/gpu/rfc/gpusvm.rst | 12 +- > > > > > > > drivers/gpu/drm/Makefile | 6 +- > > > > > > > drivers/gpu/drm/drm_gpusvm.c | 759 +------------ > > > > > > > ---- > > > > > > > ----- > > > > > > > ---- > > > > > > > drivers/gpu/drm/drm_pagemap.c | 788 > > > > > > > +++++++++++++++++++++++++++ > > > > > > > drivers/gpu/drm/xe/Kconfig | 10 +- > > > > > > > drivers/gpu/drm/xe/xe_bo_types.h | 2 +- > > > > > > > drivers/gpu/drm/xe/xe_device_types.h | 2 +- > > > > > > > drivers/gpu/drm/xe/xe_svm.c | 47 +- > > > > > > > include/drm/drm_gpusvm.h | 96 ---- > > > > > > > include/drm/drm_pagemap.h | 101 ++++ > > > > > > > 10 files changed, 950 insertions(+), 873 deletions(-) > > > > > > > create mode 100644 drivers/gpu/drm/drm_pagemap.c > > > > > > > > > > > > > > diff --git a/Documentation/gpu/rfc/gpusvm.rst > > > > > > > b/Documentation/gpu/rfc/gpusvm.rst > > > > > > > index bcf66a8137a6..469db1372f16 100644 > > > > > > > --- a/Documentation/gpu/rfc/gpusvm.rst > > > > > > > +++ b/Documentation/gpu/rfc/gpusvm.rst > > > > > > > @@ -73,15 +73,21 @@ Overview of baseline design > > > > > > > .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c > > > > > > > :doc: Locking > > > > > > > > > > > > > > -.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c > > > > > > > - :doc: Migration > > > > > > > - > > > > > > > .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c > > > > > > > :doc: Partial Unmapping of Ranges > > > > > > > > > > > > > > .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c > > > > > > > :doc: Examples > > > > > > > > > > > > > > +Overview of drm_pagemap design > > > > > > > +============================== > > > > > > > + > > > > > > > +.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c > > > > > > > + :doc: Overview > > > > > > > + > > > > > > > +.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c > > > > > > > + :doc: Migration > > > > > > > + > > > > > > > Possible future design features > > > > > > > =============================== > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/Makefile > > > > > > > b/drivers/gpu/drm/Makefile > > > > > > > index 5050ac32bba2..4dafbdc8f86a 100644 > > > > > > > --- a/drivers/gpu/drm/Makefile > > > > > > > +++ b/drivers/gpu/drm/Makefile > > > > > > > @@ -104,7 +104,11 @@ obj- > > > > > > > $(CONFIG_DRM_PANEL_BACKLIGHT_QUIRKS) > > > > > > > += > > > > > > > drm_panel_backlight_quirks.o > > > > > > > # > > > > > > > obj-$(CONFIG_DRM_EXEC) += drm_exec.o > > > > > > > obj-$(CONFIG_DRM_GPUVM) += drm_gpuvm.o > > > > > > > -obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm.o > > > > > > > + > > > > > > > +drm_gpusvm_helper-y := \ > > > > > > > + drm_gpusvm.o\ > > > > > > > + drm_pagemap.o > > > > > > > +obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm_helper.o > > > > > > > > > > > > > > obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_gpusvm.c > > > > > > > b/drivers/gpu/drm/drm_gpusvm.c > > > > > > > index 7ff81aa0a1ca..ef81381609de 100644 > > > > > > > --- a/drivers/gpu/drm/drm_gpusvm.c > > > > > > > +++ b/drivers/gpu/drm/drm_gpusvm.c > > > > > > > @@ -8,10 +8,9 @@ > > > > > > > > > > > > > > #include <linux/dma-mapping.h> > > > > > > > #include <linux/hmm.h> > > > > > > > +#include <linux/hugetlb_inline.h> > > > > > > > #include <linux/memremap.h> > > > > > > > -#include <linux/migrate.h> > > > > > > > #include <linux/mm_types.h> > > > > > > > -#include <linux/pagemap.h> > > > > > > > #include <linux/slab.h> > > > > > > > > > > > > > > #include <drm/drm_device.h> > > > > > > > @@ -107,21 +106,6 @@ > > > > > > > * to add annotations to GPU SVM. > > > > > > > */ > > > > > > > > > > > > > > -/** > > > > > > > - * DOC: Migration > > > > > > > - * > > > > > > > - * The migration support is quite simple, allowing > > > > > > > migration > > > > > > > between RAM and > > > > > > > - * device memory at the range granularity. For example, > > > > > > > GPU > > > > > > > SVM > > > > > > > currently does > > > > > > > - * not support mixing RAM and device memory pages within a > > > > > > > range. > > > > > > > This means > > > > > > > - * that upon GPU fault, the entire range can be migrated > > > > > > > to > > > > > > > device > > > > > > > memory, and > > > > > > > - * upon CPU fault, the entire range is migrated to RAM. > > > > > > > Mixed > > > > > > > RAM > > > > > > > and device > > > > > > > - * memory storage within a range could be added in the > > > > > > > future > > > > > > > if > > > > > > > required. > > > > > > > - * > > > > > > > - * The reasoning for only supporting range granularity is > > > > > > > as > > > > > > > follows: it > > > > > > > - * simplifies the implementation, and range sizes are > > > > > > > driver- > > > > > > > defined and should > > > > > > > - * be relatively small. > > > > > > > - */ > > > > > > > - > > > > > > > /** > > > > > > > * DOC: Partial Unmapping of Ranges > > > > > > > * > > > > > > > @@ -193,10 +177,9 @@ > > > > > > > * if (driver_migration_policy(range)) { > > > > > > > * mmap_read_lock(mm); > > > > > > > * devmem = driver_alloc_devmem(); > > > > > > > - * err = > > > > > > > drm_gpusvm_migrate_to_devmem(gpusvm, > > > > > > > range, > > > > > > > - * > > > > > > > devmem_allocation, > > > > > > > - * > > > > > > > &ctx); > > > > > > > - * mmap_read_unlock(mm); > > > > > > > + * err = > > > > > > > drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm, > > > > > > > gpuva_start, > > > > > > > + * > > > > > > > gpuva_end, driver_pgmap_owner()); > > > > > > > > > > > > > > > > > > > > > > > > fix doc passing timeslice as parameter. > > > > > > > > > > Will fix. > > > > > > > > > > > > > > > > > > > > > 8<------------------------------------------------------------- > > > > > ---- > > > > > ---- > > > > > > > +/** > > > > > > > + * drm_pagemap_migrate_to_devmem() - Migrate a struct > > > > > > > mm_struct > > > > > > > range to device memory > > > > > > > + * @devmem_allocation: The device memory allocation to > > > > > > > migrate > > > > > > > to. > > > > > > > + * The caller should hold a reference to the device memory > > > > > > > allocation, > > > > > > > + * and the reference is consumed by this function unless > > > > > > > it > > > > > > > returns with > > > > > > > + * an error. > > > > > > > + * @mm: Pointer to the struct mm_struct. > > > > > > > + * @start: Start of the virtual address range to migrate. > > > > > > > + * @end: End of the virtual address range to migrate. > > > > > > > + * @timeslice_ms: The time requested for the migrated > > > > > > > pages to > > > > > > > + * be present in the cpu memory map before migrated back. > > > > As Himal suggests and see below... > > > > s/cpu/gpu > > > > > > > > > > > > > > Shouldn't this be present in gpu or cpu memory map ? We are > > > > > > using > > > > > > this > > > > > > to ensure pagefault can be handled effectively by ensuring > > > > > > pages > > > > > > remain > > > > > > in vram here for prescribed time too. > > > > > > > > > > So with this split, drm_pagemap is responsible for migrating > > > > > memory > > > > > and > > > > > updating the CPU memory map only, whereas drm_gpusvm is > > > > > responsible > > > > > for > > > > > setting up the GPU memory maps. So if it remains in the CPU > > > > > memory > > > > > map, > > > > > then nothing will force the gpu vms to invalidate, unless the > > > > > gpu > > > > > driver decides to invalidate itself. > > > > > > > > Thats true. > > > > > > > > > > > > > > But looking at this i should probably rephrase "before migrated > > > > > back" > > > > > to "before being allowed to be migrated back". > > > > > > > > The confusion for me is that timeslice_ms does not represent the > > > > time > > > > pages are required to stay in the CPU memory map, but rather the > > > > time > > > > they must remain in the GPU memory map. We defer migrate_to_smem > > > > until > > > > this timeslice has expired. > > > > Ideally, we'd want the timeslice to ensure CPU residency as well. The > > issue arises because the timeslice is attached to the physical memory > > (drm_gpusvm_devmem), which is not readily available on the GPU fault > > handling side. UMD testing showed that ensuring CPU residency is not > > required for atomics to work, so this part of the implementation was > > left out. We can revisit and find a solution here if it turns out to > > be > > required. > > > > For now, I'd fix the kernel doc above (s/cpu/gpu) with my suggestion > > and > > perhaps put in a comment why we don't wait on the GPU fault side. > > > > > Matt > > Hi, Matt, Please see my second response to Himal. > > It might be that there is a confusion between "pages present in the cpu > memory map" and "memory present in system memory"? > > What I mean with "present in the cpu memory map" is that the pagemap > pages (device private memory) is present in the struct mm_struct for at > least xx ms. The struct mm is the authoritative information of which > pages currently hold the data. > > So "migrated pages are present in the cpu memory map" is therefore more > or less equivalent to "data is present in device private memory". >
Ah, ok. > So maybe I should rephrase this as > "The time requested for the pagemap pages to be present in @mm"? And Yea, that seems better. Matt > add a discussion like in Himal's reply on the typical use-case? > > the pagemap itself has no control of how and if the gpu vms choose to > expose this. It can only guarantee that the authoritative location of > data is the device private memory for timeslice_ms. > > /Thomas > > > > > > > > > > > Yeah, although drm_pagemap is not aware of any gpu memory map so it > > > would be incorrect to bring that up in the api docs. Could add some > > > discussion, though, that "this could be used to..." and give the > > > typical gpu use-case? > > > > > > Thanks, > > > Thomas > > > >
