On Thu, Nov 06, 2025 at 10:46:13AM +0000, Lorenzo Stoakes wrote: > This patch adds the ability to atomically set VMA flags with only the mmap > read/VMA read lock held. > > As this could be hugely problematic for VMA flags in general given that all > other accesses are non-atomic and serialised by the mmap/VMA locks, we > implement this with a strict allow-list - that is, only designated flags > are allowed to do this. > > We make VM_MAYBE_GUARD one of these flags, and then set it under the mmap > read flag upon guard region installation. > > The places where this flag is used currently and matter are: > > * VMA merge - performed under mmap/VMA write lock, therefore excluding > racing writes. > > * /proc/$pid/smaps - can race the write, however this isn't meaningful as > the flag write is performed at the point of the guard region being > established, and thus an smaps reader can't reasonably expect to avoid > races. Due to atomicity, a reader will observe either the flag being set > or not. Therefore consistency will be maintained. > > In all other cases the flag being set is irrelevant and atomicity > guarantees other flags will be read correctly.
Probably important to write down that the only reason why this doesn't make KCSAN have a small stroke is that we are only changing one bit. i.e we can only have one bit of atomic flags before annotating every reader. (Source: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/kcsan/permissive.h#n51) > We additionally update madvise_guard_install() to ensure that > anon_vma_prepare() is set for anonymous VMAs to maintain consistency with > the assumption that any anonymous VMA with page tables will have an > anon_vma set, and any with an anon_vma unset will not have page tables > established. Isn't that what we already had? Or do you mean "*only* set for anonymous VMAs"? > > Signed-off-by: Lorenzo Stoakes <[email protected]> With the nits below and above addressed: Reviewed-by: Pedro Falcato <[email protected]> > --- > include/linux/mm.h | 23 +++++++++++++++++++++++ > mm/madvise.c | 22 ++++++++++++++-------- > 2 files changed, 37 insertions(+), 8 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2a5516bff75a..2ea65c646212 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -518,6 +518,9 @@ extern unsigned int kobjsize(const void *objp); > /* This mask represents all the VMA flag bits used by mlock */ > #define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT) > > +/* These flags can be updated atomically via VMA/mmap read lock. */ > +#define VM_ATOMIC_SET_ALLOWED VM_MAYBE_GUARD > + > /* Arch-specific flags to clear when updating VM flags on protection change > */ > #ifndef VM_ARCH_CLEAR > # define VM_ARCH_CLEAR VM_NONE > @@ -860,6 +863,26 @@ static inline void vm_flags_mod(struct vm_area_struct > *vma, > __vm_flags_mod(vma, set, clear); > } > > +/* > + * Set VMA flag atomically. Requires only VMA/mmap read lock. Only specific > + * valid flags are allowed to do this. > + */ > +static inline void vma_flag_set_atomic(struct vm_area_struct *vma, > + int bit) > +{ > + const vm_flags_t mask = BIT(bit); > + > + /* mmap read lock/VMA read lock must be held. */ > + if (!rwsem_is_locked(&vma->vm_mm->mmap_lock)) > + vma_assert_locked(vma); > + > + /* Only specific flags are permitted */ > + if (WARN_ON_ONCE(!(mask & VM_ATOMIC_SET_ALLOWED))) > + return; VM_WARN_ON_ONCE? -- Pedro
