On 2/2/26 12:36, Jordan Niethe wrote:
> The existing design of device private memory imposes limitations which
> render it non functional for certain systems and configurations where
> the physical address space is limited.
> 
> Device private memory is implemented by first reserving a region of the
> physical address space. This is a problem. The physical address space is
> not a resource that is directly under the kernel's control. Availability
> of suitable physical address space is constrained by the underlying
> hardware and firmware and may not always be available.
> 
> Device private memory assumes that it will be able to reserve a device
> memory sized chunk of physical address space. However, there is nothing
> guaranteeing that this will succeed, and there a number of factors that
> increase the likelihood of failure. We need to consider what else may
> exist in the physical address space. It is observed that certain VM
> configurations place very large PCI windows immediately after RAM. Large
> enough that there is no physical address space available at all for
> device private memory. This is more likely to occur on 43 bit physical
> width systems which have less physical address space.
> 
> Instead of using the physical address space, introduce a device private
> address space and allocate devices regions from there to represent the
> device private pages.
> 
> Introduce a new interface memremap_device_private_pagemap() that
> allocates a requested amount of device private address space and creates
> the necessary device private pages.
> 
> To support this new interface, struct dev_pagemap needs some changes:
> 
>   - Add a new dev_pagemap::nr_pages field as an input parameter.
>   - Add a new dev_pagemap::pages array to store the device
>     private pages.
> 
> When using memremap_device_private_pagemap(), rather then passing in
> dev_pagemap::ranges[dev_pagemap::nr_ranges] of physical address space to
> be remapped, dev_pagemap::nr_ranges will always be 1, and the device
> private range that is reserved is returned in dev_pagemap::range.
> 
> Forbid calling memremap_pages() with dev_pagemap::ranges::type =
> MEMORY_DEVICE_PRIVATE.
> 
> Represent this device private address space using a new
> device_private_pgmap_tree maple tree. This tree maps a given device
> private address to a struct dev_pagemap, where a specific device private
> page may then be looked up in that dev_pagemap::pages array.
> 
> Device private address space can be reclaimed and the assoicated device
> private pages freed using the corresponding new
> memunmap_device_private_pagemap() interface.
> 
> Because the device private pages now live outside the physical address
> space, they no longer have a normal PFN. This means that page_to_pfn(),
> et al. are no longer meaningful.
> 
> Introduce helpers:
> 
>   - device_private_page_to_offset()
>   - device_private_folio_to_offset()
> 
> to take a given device private page / folio and return its offset within
> the device private address space.
> 
> Update the places where we previously converted a device private page to
> a PFN to use these new helpers. When we encounter a device private
> offset, instead of looking up its page within the pagemap use
> device_private_offset_to_page() instead.
> 
> Update the existing users:
> 
>  - lib/test_hmm.c
>  - ppc ultravisor
>  - drm/amd/amdkfd
>  - gpu/drm/xe
>  - gpu/drm/nouveau
> 
> to use the new memremap_device_private_pagemap() interface.
> 
> Acked-by: Felix Kuehling <[email protected]>
> Reviewed-by: Zi Yan <[email protected]> # for MM changes
> Signed-off-by: Jordan Niethe <[email protected]>
> Signed-off-by: Alistair Popple <[email protected]>
> 
> ---
> v1:
> - Include NUMA node paramater for memremap_device_private_pagemap()
> - Add devm_memremap_device_private_pagemap() and friends
> - Update existing users of memremap_pages():
>     - ppc ultravisor
>     - drm/amd/amdkfd
>     - gpu/drm/xe
>     - gpu/drm/nouveau
> - Update for HMM huge page support
> - Guard device_private_offset_to_page and friends with CONFIG_ZONE_DEVICE
> 
> v2:
> - Make sure last member of struct dev_pagemap remains 
> DECLARE_FLEX_ARRAY(struct range, ranges);
> 
> v3:
> - Use numa_mem_id() if memremap_device_private_pagemap is called with
>   NUMA_NO_NODE. This fixes a null pointer deref in
>   lruvec_stat_mod_folio().
> - drm/xe: Remove call to devm_release_mem_region() in 
> xe_pagemap_destroy_work()
> - s/VM_BUG/VM_WARN/
> 
> v4:
> - Use devm_memunmap_device_private_pagemap() in
>   xe_pagemap_destroy_work()
> - Replace ^ with != for PVMW_DEVICE_PRIVATE comparisions
> - Minor style changes
> - remove discussion of aarch64 from commit message - not relevant post
>   eeb8fdfcf090 ("arm64: Expose the end of the linear map in PHYSMEM_END")
> 
> v6:
> - Fix maybe unused in kgd2kfd_init_zone_device()
> - Replace division by PAGE_SIZE with DIV_ROUND_UP() when setting
>   nr_pages. This mirrors the align up that previously happened in
>   get_free_mem_region()
> ---


There is just too much in this patch to review it reasonably.

You should probably have a patch that just introduces the helpers and
have them just do what we to today.

E.g., device_private_page_to_offset() would just do a pfn_to_page().

Then you can convert individual core-mm pieces that I people can review
them making their brain hurt.

Afterwards, you can have a patch that does the real "mm: Remove device
private pages from the physical address space" and doesn't have to touch
too many core-mm pieces.

[...]

> diff --git a/mm/util.c b/mm/util.c
> index 65e3f1a97d76..8482ebc5c394 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1244,7 +1244,10 @@ void snapshot_page(struct page_snapshot *ps, const 
> struct page *page)
>       struct folio *foliop;
>       int loops = 5;
>  
> -     ps->pfn = page_to_pfn(page);
> +     if (is_device_private_page(page))
> +             ps->pfn = device_private_page_to_offset(page);
> +     else
> +             ps->pfn = page_to_pfn(page);
>       ps->flags = PAGE_SNAPSHOT_FAITHFUL;

Why is that not done by the caller?

-- 
Cheers,

David

Reply via email to