On Wed, Sep 17, 2025 at 08:11:13PM +0100, Lorenzo Stoakes wrote: > Since we can now perform actions after the VMA is established via > mmap_prepare, use desc->action_success_hook to set up the hugetlb lock > once the VMA is setup. > > We also make changes throughout hugetlbfs to make this possible. > > Signed-off-by: Lorenzo Stoakes <[email protected]> > Reviewed-by: Jason Gunthorpe <[email protected]> > --- > fs/hugetlbfs/inode.c | 36 ++++++++++------ > include/linux/hugetlb.h | 9 +++- > include/linux/hugetlb_inline.h | 15 ++++--- > mm/hugetlb.c | 77 ++++++++++++++++++++-------------- > 4 files changed, 85 insertions(+), 52 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index f42548ee9083..9e0625167517 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -96,8 +96,15 @@ static const struct fs_parameter_spec > hugetlb_fs_parameters[] = { > #define PGOFF_LOFFT_MAX \ > (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1))) > > -static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) > +static int hugetlb_file_mmap_prepare_success(const struct vm_area_struct > *vma) > { > + /* Unfortunate we have to reassign vma->vm_private_data. */ > + return hugetlb_vma_lock_alloc((struct vm_area_struct *)vma); > +}
Hi Lorenzo, The following tests causes the kernel to enter a blocked state, suggesting an issue related to locking order. I was able to reproduce this behavior in certain test runs. Test case: git clone https://github.com/libhugetlbfs/libhugetlbfs.git cd libhugetlbfs ; ./configure make -j32 cd tests echo 100 > /proc/sys/vm/nr_hugepages mkdir -p /test-hugepages && mount -t hugetlbfs nodev /test-hugepages ./run_tests.py <in a loop> ... shm-fork 10 100 (1024K: 64): PASS set shmmax limit to 104857600 shm-getraw 100 /dev/full (1024K: 32): shm-getraw 100 /dev/full (1024K: 64): PASS fallocate_stress.sh (1024K: 64): <blocked> Blocked task state below: task:fallocate_stres state:D stack:0 pid:5106 tgid:5106 ppid:5103 task_flags:0x400000 flags:0x00000001 Call Trace: [<00000255adc646f0>] __schedule+0x370/0x7f0 [<00000255adc64bb0>] schedule+0x40/0xc0 [<00000255adc64d32>] schedule_preempt_disabled+0x22/0x30 [<00000255adc68492>] rwsem_down_write_slowpath+0x232/0x610 [<00000255adc68922>] down_write_killable+0x52/0x80 [<00000255ad12c980>] vm_mmap_pgoff+0xc0/0x1f0 [<00000255ad164bbe>] ksys_mmap_pgoff+0x17e/0x220 [<00000255ad164d3c>] __s390x_sys_old_mmap+0x7c/0xa0 [<00000255adc60e4e>] __do_syscall+0x12e/0x350 [<00000255adc6cfee>] system_call+0x6e/0x90 task:fallocate_stres state:D stack:0 pid:5109 tgid:5106 ppid:5103 task_flags:0x400040 flags:0x00000001 Call Trace: [<00000255adc646f0>] __schedule+0x370/0x7f0 [<00000255adc64bb0>] schedule+0x40/0xc0 [<00000255adc64d32>] schedule_preempt_disabled+0x22/0x30 [<00000255adc68492>] rwsem_down_write_slowpath+0x232/0x610 [<00000255adc688be>] down_write+0x4e/0x60 [<00000255ad1c11ec>] __hugetlb_zap_begin+0x3c/0x70 [<00000255ad158b9c>] unmap_vmas+0x10c/0x1a0 [<00000255ad180844>] vms_complete_munmap_vmas+0x134/0x2e0 [<00000255ad1811be>] do_vmi_align_munmap+0x13e/0x170 [<00000255ad1812ae>] do_vmi_munmap+0xbe/0x140 [<00000255ad183f86>] __vm_munmap+0xe6/0x190 [<00000255ad166832>] __s390x_sys_munmap+0x32/0x40 [<00000255adc60e4e>] __do_syscall+0x12e/0x350 [<00000255adc6cfee>] system_call+0x6e/0x90 Thanks, Sumanth
