On Wed, May 20, 2026 at 2:07 AM Barry Song <[email protected]> wrote: > > On Wed, May 20, 2026 at 3:50 PM Lorenzo Stoakes <[email protected]> wrote: > > > > On Wed, May 20, 2026 at 05:18:52AM +0800, Barry Song wrote: > > > On Tue, May 19, 2026 at 8:53 PM Lorenzo Stoakes <[email protected]> wrote: > > > > > > > > On Mon, May 18, 2026 at 12:56:59PM -0700, Suren Baghdasaryan wrote: > > > > > > > > > > > > > > > > I think we either need to fix `fork()`, or keep the current > > > > > > behavior of dropping the VMA lock before performing I/O. > > > > > > > > > > I see. So, this problem arises from the fact that we are changing the > > > > > pagefaults requiring I/O operation to hold VMA lock... > > > > > And you want to lock VMA on fork only if vma_is_anonymous(vma) || > > > > > is_cow_mapping(vma->vm_flags). So, we will be blocking page faults for > > > > > anonymous and COW VMAs only while holding mmap_write_lock, preventing > > > > > any VMA modification. On the surface, that looks ok to me but I might > > > > > be missing some corner cases. If nobody sees any obvious issues, I > > > > > think it's worth a try. > > > > > > > > Not sure if you noticed but I did raise concerns ;) > > > > > > > > I wonder if you've confused the fault path and fork here, as I think > > > > Barry has > > > > been a little unclear on that. > > > > > > I think I’ve been absolutely clear :-) > > > > On this point sure, I would argue less so around the fork stuff but I > > responded > > on that specifically elsewhere so let's keep things moving :>) > > > > > We should either stick to the current behavior - drop > > > the VMA lock before doing I/O, or change fork() so that it > > > does not wait on vma_start_write(). > > > > Again, as I said elsewhere, I think there might be a 3rd way possibly. It's > > a > > big mistake to assume that there are only specific solutions to problems in > > the > > kernel then to present a false dichotomy. > > I recalled that when we discussed this part in my slides: > > ‘For simplicity, rather than using a whitelist mechanism for > per-VMA retry, we could use a blacklist instead: default to > always retry via the VMA lock, and only allow mmap_lock-based > page-fault retry for specific cases such as > __vmf_anon_prepare().’ > > Suren mentioned introducing a FALLBACK flag. With the > FALLBACK flag, we would retry via mmap_lock; with the RETRY > flag, we would retry via the VMA lock. > > Not sure whether this could really be called a ‘third way,’ > but it seems more like a shift from a whitelist model to a > blacklist model, without changing the fundamental design, but > it does change where we would need to touch the source code.
I thought the conclusion of the LSFMM discussion was that this is the direction we would take. Maybe there were followup discussions which I missed? This approach still drops the lock before I/O but after I/O completion it reacquires the same per-VMA lock instead of falling back to mmap_lock. IMO it's the simplest fix for the issue you brought up. > > > > > We absolutely hear you on this being a problem and it WILL be addressed one > > way > > or another. > > Thanks. This is a bit of light in what has felt like a fairly > dark situation. I really appreciate your thoughtful and > responsible approach. > > > > > Of the two approaches, as I said elsewhere, I prefer what you've done in > > this > > series to anything touching fork. > > > > But give me time to look through the series please (I'd also suggest RFC'ing > > when it's something kinda fundamental that might generate converastion, > > makes > > life a bit easier on the review side :) > > Thanks! Sure, I’m happy to wait and there’s no urgency. > > Last year you made quite a significant contribution to the work > when I tried to remove mmap_lock in madvise. I really > appreciated it. Now we’re back to the same lock again, just in > different places. > > Best Regards > Barry
