On Tue, Sep 16, 2025 at 03:11:52PM +0100, Lorenzo Stoakes wrote: > We need the ability to split PFN remap between updating the VMA and > performing the actual remap, in order to do away with the legacy > f_op->mmap hook. > > To do so, update the PFN remap code to provide shared logic, and also make > remap_pfn_range_notrack() static, as its one user, io_mapping_map_user() > was removed in commit 9a4f90e24661 ("mm: remove mm/io-mapping.c"). > > Then, introduce remap_pfn_range_prepare(), which accepts VMA descriptor > and PFN parameters, and remap_pfn_range_complete() which accepts the same > parameters as remap_pfn_rangte(). > > remap_pfn_range_prepare() will set the cow vma->vm_pgoff if necessary, so > it must be supplied with a correct PFN to do so. If the caller must hold > locks to be able to do this, those locks should be held across the > operation, and mmap_abort() should be provided to revoke the lock should > an error arise.
It looks like the patches have changed to remove mmap_abort so this paragraph can probably be dropped. > static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned > long addr, > - unsigned long pfn, unsigned long size, pgprot_t prot) > + unsigned long pfn, unsigned long size, pgprot_t prot, bool > set_vma) > { > pgd_t *pgd; > unsigned long next; > @@ -2912,32 +2931,17 @@ static int remap_pfn_range_internal(struct > vm_area_struct *vma, unsigned long ad > if (WARN_ON_ONCE(!PAGE_ALIGNED(addr))) > return -EINVAL; > > - /* > - * Physically remapped pages are special. Tell the > - * rest of the world about it: > - * VM_IO tells people not to look at these pages > - * (accesses can have side effects). > - * VM_PFNMAP tells the core MM that the base pages are just > - * raw PFN mappings, and do not have a "struct page" associated > - * with them. > - * VM_DONTEXPAND > - * Disable vma merging and expanding with mremap(). > - * VM_DONTDUMP > - * Omit vma from core dump, even when VM_IO turned off. > - * > - * There's a horrible special case to handle copy-on-write > - * behaviour that some programs depend on. We mark the "original" > - * un-COW'ed pages by matching them up with "vma->vm_pgoff". > - * See vm_normal_page() for details. > - */ > - if (is_cow_mapping(vma->vm_flags)) { > - if (addr != vma->vm_start || end != vma->vm_end) > - return -EINVAL; > - vma->vm_pgoff = pfn; > + if (set_vma) { > + err = get_remap_pgoff(vma->vm_flags, addr, end, > + vma->vm_start, vma->vm_end, > + pfn, &vma->vm_pgoff); > + if (err) > + return err; > + vm_flags_set(vma, VM_REMAP_FLAGS); > + } else { > + VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) != > VM_REMAP_FLAGS); > } It looks like you can avoid the changes to add set_vma by making remap_pfn_range_internal() only do the complete portion and giving the legacy calls a helper to do prepare in line: int remap_pfn_range_prepare_vma(..) { int err; err = get_remap_pgoff(vma->vm_flags, addr, end, vma->vm_start, vma->vm_end, pfn, &vma->vm_pgoff); if (err) return err; vm_flags_set(vma, VM_REMAP_FLAGS); return 0; } int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t prot) { int err; err = remap_pfn_range_prepare_vma(vma, addr, pfn, size) if (err) return err; if (IS_ENABLED(__HAVE_PFNMAP_TRACKING)) return remap_pfn_range_track(vma, addr, pfn, size, prot); return remap_pfn_range_notrack(vma, addr, pfn, size, prot); } (fix pgtable_Types.h to #define to 1 so IS_ENABLED works) But the logic here is all fine Reviewed-by: Jason Gunthorpe <j...@nvidia.com> Jason