On Mon, May 4, 2026 at 9:04 PM Jan Kara <[email protected]> wrote: > > On Mon 04-05-26 03:55:43, Barry Song wrote: > > On Mon, May 4, 2026 at 2:17 AM Jan Kara <[email protected]> wrote: > > > On Fri 01-05-26 18:57:52, Matthew Wilcox wrote: > > > > On Sat, May 02, 2026 at 01:44:34AM +0800, Barry Song wrote: > > > > > On Fri, May 1, 2026 at 10:57 PM Matthew Wilcox <[email protected]> > > > > > wrote: > > > > > > On Fri, May 01, 2026 at 06:49:58AM +0800, Barry Song wrote: > > > > > > > 1. There is no deterministic latency for I/O completion. It > > > > > > > depends on > > > > > > > both the hardware and the software stack (bio/request queues and > > > > > > > the > > > > > > > block scheduler). Sometimes the latency is short; at other times > > > > > > > it can > > > > > > > be quite long. In such cases, a high-priority thread performing > > > > > > > operations > > > > > > > such as mprotect, unmap, prctl_set_vma, or madvise may be forced > > > > > > > to wait > > > > > > > for an unpredictable amount of time. > > > > > > > > > > > > But does that actually happen? I find it hard to believe that > > > > > > thread A > > > > > > unmaps a VMA while thread B is in the middle of taking a page fault > > > > > > in > > > > > > that same VMA. mprotect() and madvise() are more likely to happen, > > > > > > but > > > > > > it still seems really unlikely to me. > > > > > > > > > > It doesn’t have to involve unmapping or applying mprotect to > > > > > the entire VMA—just a portion of it is sufficient. > > > > > > > > Yes, but that still fails to answer "does this actually happen". How > > > > much > > > > performance is all this complexity in the page fault handler buying us? > > > > If you don't answer this question, I'm just going to go in and rip it > > > > all out. > > > > > > I fully agree with you we should verify whether the retry code still > > > brings > > > in real-world advantage today with VMA locks. After all the retry logic > > > has > > > been introduced in 2010. That being said if there are realistic loads > > > where > > > one thread needs VMA write lock while another thread is faulting the VMA, > > > then the latencies can be indeed extreme. For example things like cgroup > > > IO > > > throttling happen on the IO path and thus can throttle IO of a > > > low-priority > > > thread for a long time. > > > > I’m quite sure that swap-in and VMA writes can occur > > concurrently, and this is fairly common. For example, > > Java GC may use mprotect or userfaultfd on a small > > portion of a large Java heap while other portions are > > still under do_swap_page(). > > OK, makes sense. > > > If we start exploring different approaches for anon and > > file, I agree I can revisit this on an Android phone if > > there is a real, serious case where a file VMA can be > > written and a page fault occurs at the same time. > > > > Please note that, as an Android developer, I am particularly > > cautious about priority inversion. A recent issue causing > > severe priority inversion is zram attempting to support > > preemption[1]. When a task performing compression or > > decompression is migrated to another CPU and then preempted > > by other tasks, high-priority tasks waiting on the mutex may > > be significantly delayed, impacting user experience. > > Well, container people are concerned about priority inversion as well. But > usually this is with coarse lock (such as global filesystem locks) but VMA > lock is specific to a task (and a VMA) so there the opportunity for > priority inversion looks more limited. But the example with Java where GC > thread can presumably have higher priority than ordinary Java threads is an > interesting one.
A major difference in Android apps is that each thread can affect user experience differently. And it is not simply a matter of whether a VMA writer has higher or lower priority than a page-fault (PF) thread performing I/O. For example, thread A handles a PF; thread B attempts to modify the VMA where the PF occurs; thread C tries to modify another VMA (requiring mmap_lock in write mode) or iterate VMAs (requiring mmap_lock in read mode). Regardless of thread B’s priority, it holds mmap_lock in write mode while waiting for the VMA lock. The usual pattern for a VMA writer is: mmap_write_lock() vma_start_write() As a result, thread C can be blocked even if it has higher priority but operates on a different VMA. In essence, when a PF and a VMA write occur concurrently, high-priority threads may be blocked even if they operate on different VMAs, not necessarily the same one. Thanks Barry
