Hi Lorenzo! On Mon 08-09-25 12:10:31, Lorenzo Stoakes wrote: > Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file > callback"), The f_op->mmap hook has been deprecated in favour of > f_op->mmap_prepare. > > This was introduced in order to make it possible for us to eventually > eliminate the f_op->mmap hook which is highly problematic as it allows > drivers and filesystems raw access to a VMA which is not yet correctly > initialised. > > This hook also introduces complexity for the memory mapping operation, as > we must correctly unwind what we do should an error arises. > > Overall this interface being so open has caused significant problems for > us, including security issues, it is important for us to simply eliminate > this as a source of problems. > > Therefore this series continues what was established by extending the > functionality further to permit more drivers and filesystems to use > mmap_prepare. > > After updating some areas that can simply use mmap_prepare as-is, and > performing some housekeeping, we then introduce two new hooks: > > f_op->mmap_complete - this is invoked at the point of the VMA having been > correctly inserted, though with the VMA write lock still held. mmap_prepare > must also be specified. > > This expands the use of mmap_prepare to those callers which need to > prepopulate mappings, as well as any which does genuinely require access to > the VMA. > > It's simple - we will let the caller access the VMA, but only once it's > established. At this point unwinding issues is simple - we just unmap the > VMA. > > The VMA is also then correctly initialised at this stage so there can be no > issues arising from a not-fully initialised VMA at this point. > > The other newly added hook is: > > f_op->mmap_abort - this is only valid in conjunction with mmap_prepare and > mmap_complete. This is called should an error arise between mmap_prepare > and mmap_complete (not as a result of mmap_prepare but rather some other > part of the mapping logic). > > This is required in case mmap_prepare wishes to establish state or locks > which need to be cleaned up on completion. If we did not provide this, then > this could not be permitted as this cleanup would otherwise not occur > should the mapping fail between the two calls.
So seeing these new hooks makes me wonder: Shouldn't rather implement mmap(2) in a way more similar to how other f_op hooks behave like ->read or ->write? I.e., a hook called at rather high level - something like from vm_mmap_pgoff() or similar similar level - which would just call library functions from MM for the stuff it needs to do. Filesystems would just do their checks and call the generic mmap function with the vm_ops they want to use, more complex users could then fill in the VMA before releasing mmap_lock or do cleanup in case of failure... This would seem like a more understandable API than several hooks with rules when what gets called. Honza > > We then add split remap_pfn_range*() functions which allow for PFN remap (a > typical mapping prepopulation operation) split between a prepare/complete > step, as well as io_mremap_pfn_range_prepare, complete for a similar > purpose. > > From there we update various mm-adjacent logic to use this functionality as > a first set of changes, as well as resctl and cramfs filesystems to round > off the non-stacked filesystem instances. > > > REVIEWER NOTE: > ~~~~~~~~~~~~~~ > > I considered putting the complete, abort callbacks in vm_ops, however this > won't work because then we would be unable to adjust helpers like > generic_file_mmap_prepare() (which provides vm_ops) to provide the correct > complete, abort callbacks. > > Conceptually it also makes more sense to have these in f_op as they are > one-off operations performed at mmap time to establish the VMA, rather than > a property of the VMA itself. > > Lorenzo Stoakes (16): > mm/shmem: update shmem to use mmap_prepare > device/dax: update devdax to use mmap_prepare > mm: add vma_desc_size(), vma_desc_pages() helpers > relay: update relay to use mmap_prepare > mm/vma: rename mmap internal functions to avoid confusion > mm: introduce the f_op->mmap_complete, mmap_abort hooks > doc: update porting, vfs documentation for mmap_[complete, abort] > mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() > mm: introduce io_remap_pfn_range_prepare, complete > mm/hugetlb: update hugetlbfs to use mmap_prepare, mmap_complete > mm: update mem char driver to use mmap_prepare, mmap_complete > mm: update resctl to use mmap_prepare, mmap_complete, mmap_abort > mm: update cramfs to use mmap_prepare, mmap_complete > fs/proc: add proc_mmap_[prepare, complete] hooks for procfs > fs/proc: update vmcore to use .proc_mmap_[prepare, complete] > kcov: update kcov to use mmap_prepare, mmap_complete > > Documentation/filesystems/porting.rst | 9 ++ > Documentation/filesystems/vfs.rst | 35 +++++++ > arch/csky/include/asm/pgtable.h | 5 + > arch/mips/alchemy/common/setup.c | 28 +++++- > arch/mips/include/asm/pgtable.h | 10 ++ > arch/s390/kernel/crash_dump.c | 6 +- > arch/sparc/include/asm/pgtable_32.h | 29 +++++- > arch/sparc/include/asm/pgtable_64.h | 29 +++++- > drivers/char/mem.c | 80 ++++++++------- > drivers/dax/device.c | 32 +++--- > fs/cramfs/inode.c | 134 ++++++++++++++++++-------- > fs/hugetlbfs/inode.c | 86 +++++++++-------- > fs/ntfs3/file.c | 2 +- > fs/proc/inode.c | 13 ++- > fs/proc/vmcore.c | 53 +++++++--- > fs/resctrl/pseudo_lock.c | 56 ++++++++--- > include/linux/fs.h | 4 + > include/linux/mm.h | 53 +++++++++- > include/linux/mm_types.h | 5 + > include/linux/proc_fs.h | 5 + > include/linux/shmem_fs.h | 3 +- > include/linux/vmalloc.h | 10 +- > kernel/kcov.c | 40 +++++--- > kernel/relay.c | 32 +++--- > mm/memory.c | 128 +++++++++++++++--------- > mm/secretmem.c | 2 +- > mm/shmem.c | 49 +++++++--- > mm/util.c | 18 +++- > mm/vma.c | 96 +++++++++++++++--- > mm/vmalloc.c | 16 ++- > tools/testing/vma/vma_internal.h | 31 +++++- > 31 files changed, 810 insertions(+), 289 deletions(-) > > -- > 2.51.0 -- Jan Kara <j...@suse.com> SUSE Labs, CR