On Tue, Feb 17, 2026 at 11:19 AM Liam R. Howlett
<[email protected]> wrote:
>
> * Suren Baghdasaryan <[email protected]> [260217 11:33]:
> > Now that we have vma_start_write_killable() we can replace most of the
> > vma_start_write() calls with it, improving reaction time to the kill
> > signal.
> >
> > There are several places which are left untouched by this patch:
> >
> > 1. free_pgtables() because function should free page tables even if a
> > fatal signal is pending.
> >
> > 2. process_vma_walk_lock(), which requires changes in its callers and
> > will be handled in the next patch.
> >
> > 3. userfaultd code, where some paths calling vma_start_write() can
> > handle EINTR and some can't without a deeper code refactoring.
> >
> > 4. vm_flags_{set|mod|clear} require refactoring that involves moving
> > vma_start_write() out of these functions and replacing it with
> > vma_assert_write_locked(), then callers of these functions should
> > lock the vma themselves using vma_start_write_killable() whenever
> > possible.
> >
> > Suggested-by: Matthew Wilcox <[email protected]>
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > Reviewed-by: Ritesh Harjani (IBM) <[email protected]> # powerpc
> > ---
> > arch/powerpc/kvm/book3s_hv_uvmem.c | 5 +-
> > include/linux/mempolicy.h | 5 +-
> > mm/khugepaged.c | 5 +-
> > mm/madvise.c | 4 +-
> > mm/memory.c | 2 +
> > mm/mempolicy.c | 23 ++++++--
> > mm/mlock.c | 20 +++++--
> > mm/mprotect.c | 4 +-
> > mm/mremap.c | 4 +-
> > mm/vma.c | 93 +++++++++++++++++++++---------
> > mm/vma_exec.c | 6 +-
> > 11 files changed, 123 insertions(+), 48 deletions(-)
> >
>
> ...
>
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
>
> ...
>
> >
> > static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
> > @@ -1785,9 +1793,15 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned
> > long, start, unsigned long, le
> > return -EINVAL;
> > if (end == start)
> > return 0;
> > - mmap_write_lock(mm);
> > + if (mmap_write_lock_killable(mm))
> > + return -EINTR;
> > prev = vma_prev(&vmi);
> > for_each_vma_range(vmi, vma, end) {
> > + if (vma_start_write_killable(vma)) {
> > + err = -EINTR;
> > + break;
> > + }
> > +
> > /*
> > * If any vma in the range got policy other than MPOL_BIND
> > * or MPOL_PREFERRED_MANY we return error. We don't reset
> > @@ -1808,7 +1822,6 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned
> > long, start, unsigned long, le
> > break;
> > }
> >
> > - vma_start_write(vma);
>
> Moving this vma_start_write() up means we will lock all vmas in the
> range regardless of if they are going to change. Was that your
> intention?
No, I missed that this would result in unnecessary locks.
>
> It might be better to move the locking to later in the loop, prior to
> the mpol_dup(), but after skipping other vmas?
Yes, that's the right place for it. Will move.
>
> > new->home_node = home_node;
> > err = mbind_range(&vmi, vma, &prev, start, end, new);
>
> ...
>
> > diff --git a/mm/vma.c b/mm/vma.c
> > index bb4d0326fecb..1d21351282cf 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
>
> ...
>
> > @@ -2532,6 +2556,11 @@ static int __mmap_new_vma(struct mmap_state *map,
> > struct vm_area_struct **vmap)
> > goto free_vma;
> > }
> >
> > + /* Lock the VMA since it is modified after insertion into VMA tree */
> > + error = vma_start_write_killable(vma);
> > + if (error)
> > + goto free_iter_vma;
> > +
> > if (map->file)
> > error = __mmap_new_file_vma(map, vma);
> > else if (map->vm_flags & VM_SHARED)
> > @@ -2552,8 +2581,6 @@ static int __mmap_new_vma(struct mmap_state *map,
> > struct vm_area_struct **vmap)
> > WARN_ON_ONCE(!arch_validate_flags(map->vm_flags));
> > #endif
> >
> > - /* Lock the VMA since it is modified after insertion into VMA tree */
> > - vma_start_write(vma);
> > vma_iter_store_new(vmi, vma);
> > map->mm->map_count++;
> > vma_link_file(vma, map->hold_file_rmap_lock);
>
> This is a bit of a nit on the placement..
>
> Prior to this change, the write lock on the vma was taken next to where
> it was inserted in the tree. Now the lock is taken between vma iterator
> preallocations and part of the vma setup.
>
> Would it make sense to put it closer to the vma allocation itself? I
> think all that's needed to be set is the mm struct for the locking to
> work?
I guess locking the vma before vma_iter_prealloc() would save us
unnecessary alloc/free in case of a pending fatal signal. I'll move
the lock right after vm_area_alloc() so that the entire vma setup is
done on a locked vma.
>
>
> ...
>
> > @@ -3089,7 +3120,7 @@ int expand_upwards(struct vm_area_struct *vma,
> > unsigned long address)
>
> Good luck testing this one.
Yeah... Any suggestions for tests I should use?
>
> > struct mm_struct *mm = vma->vm_mm;
> > struct vm_area_struct *next;
> > unsigned long gap_addr;
> > - int error = 0;
> > + int error;
> > VMA_ITERATOR(vmi, mm, vma->vm_start);
> >
> > if (!(vma->vm_flags & VM_GROWSUP))
> > @@ -3126,12 +3157,14 @@ int expand_upwards(struct vm_area_struct *vma,
> > unsigned long address)
> >
> > /* We must make sure the anon_vma is allocated. */
> > if (unlikely(anon_vma_prepare(vma))) {
> > - vma_iter_free(&vmi);
> > - return -ENOMEM;
> > + error = -ENOMEM;
> > + goto free;
> > }
> >
> > /* Lock the VMA before expanding to prevent concurrent page faults */
> > - vma_start_write(vma);
> > + error = vma_start_write_killable(vma);
> > + if (error)
> > + goto free;
> > /* We update the anon VMA tree. */
> > anon_vma_lock_write(vma->anon_vma);
> >
> > @@ -3160,6 +3193,7 @@ int expand_upwards(struct vm_area_struct *vma,
> > unsigned long address)
> > }
> > }
> > anon_vma_unlock_write(vma->anon_vma);
> > +free:
> > vma_iter_free(&vmi);
> > validate_mm(mm);
> > return error;
>
> Looks okay.
Thanks for the review, Liam! I'll wait a couple of days and post the
v3 with fixes.
>
> ...
>
>
>