On Wed, May 20, 2026 at 3:34 AM David Hildenbrand (Arm) <[email protected]> wrote: > > On 5/19/26 14:53, Lorenzo Stoakes wrote: > > On Mon, May 18, 2026 at 12:56:59PM -0700, Suren Baghdasaryan wrote: > > > >>> > >>> I think we either need to fix `fork()`, or keep the current > >>> behavior of dropping the VMA lock before performing I/O. > >> > >> I see. So, this problem arises from the fact that we are changing the > >> pagefaults requiring I/O operation to hold VMA lock... > >> And you want to lock VMA on fork only if vma_is_anonymous(vma) || > >> is_cow_mapping(vma->vm_flags). So, we will be blocking page faults for > >> anonymous and COW VMAs only while holding mmap_write_lock, preventing > >> any VMA modification. On the surface, that looks ok to me but I might > >> be missing some corner cases. If nobody sees any obvious issues, I > >> think it's worth a try. > > > > Not sure if you noticed but I did raise concerns ;) > > > > I wonder if you've confused the fault path and fork here, as I think Barry > > has > > been a little unclear on that. > > > > What's being suggested in this thread is to fundamentally change fork > > behaviour > > so it's different from the entire history of the kernel (or - presumably - > > at > > least recent history :) > I don't want fork() to become different in that regard. > > There is already a slight difference with vs. without per-VMA locks, because > there is a window in-between us taking the write mmap_lock and all the per-VMA > locks. I raised that previously [1] and assumed that it is probably fine. > > I also raised in the past why I think we must not allow concurrent page > faults, > at least as soon as anonymous memory is involved [2].
Thanks for sharing the context, it is quite helpful to understand the race conditions. Because Lorenzo also raised the concern about page fault race, I will reply to all the concerns regarding page fault race together in this thread. IIUC, there is already some sort of race with per vma lock. Before per vma lock, mmap_lock did lock everything. So page fault happened either before fork or after fork. But page fault can happen on other VMAs which have not been lock'ed yet during fork with per vma lock. For example, we have 3 VMAs, we lock the first VMA, but page fault still can happen on the other 2 VMAs during fork if they already have anon_vma. This is the status quo now, but it seems not harmful. The bad race shared by David is caused by racing with copy page. So it seems like it will be fine as long as we serialize copy page against page fault if I don't miss anything. Since we decide whether to copy page or not by checking vma->anon_vma, so it seems fine to not take vma lock if vma->anon_vma is NULL. This will not introduce more race either because setting up a new anon_vma in page fault or madvise requires taking mmap_lock according to the earlier discussions. Thanks, Yang > > ... and I raised that this is pretty much slower by design right now: "Well, > the > design decision that CONFIG_PER_VMA_LOCK made for now to make page faults fast > and to make blocking any page faults from happening to be slower ..." [3] > > [1] > https://lore.kernel.org/all/[email protected]/ > [2] > https://lore.kernel.org/all/[email protected]/ > [3] > https://lore.kernel.org/all/[email protected]/ > > -- > Cheers, > > David >
