On 2/2/26 12:36, Jordan Niethe wrote: > The existing design of device private memory imposes limitations which > render it non functional for certain systems and configurations where > the physical address space is limited. > > Device private memory is implemented by first reserving a region of the > physical address space. This is a problem. The physical address space is > not a resource that is directly under the kernel's control. Availability > of suitable physical address space is constrained by the underlying > hardware and firmware and may not always be available. > > Device private memory assumes that it will be able to reserve a device > memory sized chunk of physical address space. However, there is nothing > guaranteeing that this will succeed, and there a number of factors that > increase the likelihood of failure. We need to consider what else may > exist in the physical address space. It is observed that certain VM > configurations place very large PCI windows immediately after RAM. Large > enough that there is no physical address space available at all for > device private memory. This is more likely to occur on 43 bit physical > width systems which have less physical address space. > > Instead of using the physical address space, introduce a device private > address space and allocate devices regions from there to represent the > device private pages. > > Introduce a new interface memremap_device_private_pagemap() that > allocates a requested amount of device private address space and creates > the necessary device private pages. > > To support this new interface, struct dev_pagemap needs some changes: > > - Add a new dev_pagemap::nr_pages field as an input parameter. > - Add a new dev_pagemap::pages array to store the device > private pages. > > When using memremap_device_private_pagemap(), rather then passing in > dev_pagemap::ranges[dev_pagemap::nr_ranges] of physical address space to > be remapped, dev_pagemap::nr_ranges will always be 1, and the device > private range that is reserved is returned in dev_pagemap::range. > > Forbid calling memremap_pages() with dev_pagemap::ranges::type = > MEMORY_DEVICE_PRIVATE. > > Represent this device private address space using a new > device_private_pgmap_tree maple tree. This tree maps a given device > private address to a struct dev_pagemap, where a specific device private > page may then be looked up in that dev_pagemap::pages array. > > Device private address space can be reclaimed and the assoicated device > private pages freed using the corresponding new > memunmap_device_private_pagemap() interface. > > Because the device private pages now live outside the physical address > space, they no longer have a normal PFN. This means that page_to_pfn(), > et al. are no longer meaningful. > > Introduce helpers: > > - device_private_page_to_offset() > - device_private_folio_to_offset() > > to take a given device private page / folio and return its offset within > the device private address space. > > Update the places where we previously converted a device private page to > a PFN to use these new helpers. When we encounter a device private > offset, instead of looking up its page within the pagemap use > device_private_offset_to_page() instead. > > Update the existing users: > > - lib/test_hmm.c > - ppc ultravisor > - drm/amd/amdkfd > - gpu/drm/xe > - gpu/drm/nouveau > > to use the new memremap_device_private_pagemap() interface. > > Acked-by: Felix Kuehling <[email protected]> > Reviewed-by: Zi Yan <[email protected]> # for MM changes > Signed-off-by: Jordan Niethe <[email protected]> > Signed-off-by: Alistair Popple <[email protected]> > > --- > v1: > - Include NUMA node paramater for memremap_device_private_pagemap() > - Add devm_memremap_device_private_pagemap() and friends > - Update existing users of memremap_pages(): > - ppc ultravisor > - drm/amd/amdkfd > - gpu/drm/xe > - gpu/drm/nouveau > - Update for HMM huge page support > - Guard device_private_offset_to_page and friends with CONFIG_ZONE_DEVICE > > v2: > - Make sure last member of struct dev_pagemap remains > DECLARE_FLEX_ARRAY(struct range, ranges); > > v3: > - Use numa_mem_id() if memremap_device_private_pagemap is called with > NUMA_NO_NODE. This fixes a null pointer deref in > lruvec_stat_mod_folio(). > - drm/xe: Remove call to devm_release_mem_region() in > xe_pagemap_destroy_work() > - s/VM_BUG/VM_WARN/ > > v4: > - Use devm_memunmap_device_private_pagemap() in > xe_pagemap_destroy_work() > - Replace ^ with != for PVMW_DEVICE_PRIVATE comparisions > - Minor style changes > - remove discussion of aarch64 from commit message - not relevant post > eeb8fdfcf090 ("arm64: Expose the end of the linear map in PHYSMEM_END") > > v6: > - Fix maybe unused in kgd2kfd_init_zone_device() > - Replace division by PAGE_SIZE with DIV_ROUND_UP() when setting > nr_pages. This mirrors the align up that previously happened in > get_free_mem_region() > ---
There is just too much in this patch to review it reasonably. You should probably have a patch that just introduces the helpers and have them just do what we to today. E.g., device_private_page_to_offset() would just do a pfn_to_page(). Then you can convert individual core-mm pieces that I people can review them making their brain hurt. Afterwards, you can have a patch that does the real "mm: Remove device private pages from the physical address space" and doesn't have to touch too many core-mm pieces. [...] > diff --git a/mm/util.c b/mm/util.c > index 65e3f1a97d76..8482ebc5c394 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -1244,7 +1244,10 @@ void snapshot_page(struct page_snapshot *ps, const > struct page *page) > struct folio *foliop; > int loops = 5; > > - ps->pfn = page_to_pfn(page); > + if (is_device_private_page(page)) > + ps->pfn = device_private_page_to_offset(page); > + else > + ps->pfn = page_to_pfn(page); > ps->flags = PAGE_SNAPSHOT_FAITHFUL; Why is that not done by the caller? -- Cheers, David
