On Tue, May 19, 2026 at 8:53 PM Lorenzo Stoakes <[email protected]> wrote: > > On Mon, May 18, 2026 at 12:56:59PM -0700, Suren Baghdasaryan wrote: > > > > > > > I think we either need to fix `fork()`, or keep the current > > > behavior of dropping the VMA lock before performing I/O. > > > > I see. So, this problem arises from the fact that we are changing the > > pagefaults requiring I/O operation to hold VMA lock... > > And you want to lock VMA on fork only if vma_is_anonymous(vma) || > > is_cow_mapping(vma->vm_flags). So, we will be blocking page faults for > > anonymous and COW VMAs only while holding mmap_write_lock, preventing > > any VMA modification. On the surface, that looks ok to me but I might > > be missing some corner cases. If nobody sees any obvious issues, I > > think it's worth a try. > > Not sure if you noticed but I did raise concerns ;) > > I wonder if you've confused the fault path and fork here, as I think Barry has > been a little unclear on that.
I think I’ve been absolutely clear :-) We should either stick to the current behavior - drop the VMA lock before doing I/O, or change fork() so that it does not wait on vma_start_write(). Before per-VMA locks, page faults dropped mmap_lock before doing I/O. After per-VMA locks, page faults dropped the VMA lock before doing I/O. In both cases, fork() would not wait for I/O in the page-fault path. Now you guys are suggesting performing I/O while holding the VMA lock, which means fork() must wait for that I/O to complete. Since an application can have more than 1000 VMAs, and I/O can be stalled for an unpredictable amount of time in the bio/request queue or filesystem GC, fork() could end up blocked on multiple VMAs while taking vma_start_write() for each of them. As a result, fork() could hold mmap_lock for a very, very, very long time. fork() itself would become extremely slow, and any other task needing mmap_lock would also be blocked behind it. > > What's being suggested in this thread is to fundamentally change fork > behaviour > so it's different from the entire history of the kernel (or - presumably - at > least recent history :) and permit concurrent page faults to occur on a > forking > process. > > I absolutely object to this for being pretty crazy. I mean I'm not sure we > really want to be simultaneously modifying page tables while invoking > copy_page_range()? No? If you object to touching fork(), can you at least accept keeping the existing behavior of dropping the VMA lock before doing I/O? If you object to both approaches, then I really do not know how we can continue :-) Thanks Barry
