On Thu, Oct 30, 2025 at 2:48 PM Vlastimil Babka <[email protected]> wrote: > > On 10/30/25 20:47, Lorenzo Stoakes wrote: > > On Thu, Oct 30, 2025 at 07:47:34PM +0100, Vlastimil Babka wrote: > >> > > >> > Could we use MADVISE_VMA_READ_LOCK mode (would be actually an improvement > >> > over the current MADVISE_MMAP_READ_LOCK), together with the atomic flag > >> > setting? I think the places that could race with us to cause RMW use vma > >> > write lock so that would be excluded. Fork AFAICS unfortunately doesn't > >> > (for > >> > the oldmm) and it probably would't make sense to start doing it. Maybe we > >> > could think of something to deal with this special case... > >> > >> During discussion with Pedro off-list I realized fork takes mmap lock for > >> write on the old mm, so if we kept taking mmap sem for read, then vma lock > >> for read in addition (which should be cheap enough, also we'd only need it > >> in case VM_MAYBE_GUARD is not yet set), and set the flag atomicaly, perhaps > >> that would cover all non-bening races? > >> > >> > > > > We take VMA write lock in dup_mmap() on each mpnt (old VMA). > > Ah yes I thought it was the new one. > > > We take the VMA write lock (vma_start_write()) for each mpnt. > > > > We then vm_area_dup() the mpnt to the new VMA before calling: > > > > copy_page_range() > > -> vma_needs_copy() > > > > Which is where the check is done. > > > > So we are holding the VMA write lock, so a VMA read lock should suffice no? > > Yeah, even better! > > > For belts + braces we could atomically read the flag in vma_needs_copy(), > > though note it's intended VM_COPY_ON_FORK could have more than one flag. > > > > We could drop that for now and be explicit. > > Great!
Overall, I think it should be possible to set this flag atomically under VMA read-lock. However, if you introduce new vm_flags manipulation functions, please make sure they can't be used for other vm_flags. In Android I've seen several "interesting" attempts to update vm_flags under a read-lock (specifically in the page-fault path) and had to explain why that's a bad idea.
