On 6/25/26 02:35, Sean Christopherson wrote: > On Wed, Jun 24, 2026, Ackerley Tng wrote: >> Sean Christopherson <[email protected]> writes: >> >>> >>> Under what circumstances does this happen, >> >> It happened 100% of the time in selftests. Perhaps it's because in the >> selftests the pages are almost always freshly allocated and so the >> lru_add fbatch isn't full yet? (and that the host isn't super busy so >> lru_add fbatch doesn't get drained yet). > > I chatted with Ackerley about this. What I wanted to understand is why > guest_memfd > pages were getting put onto per-CPU batches for lru_add(), given that > guest_memfd > pages are unevictable. The answer (assuming I read the code right), is that > lruvec_add_folio() updates stats and other per-lru metadata for the > unevictable > lru, and does so under a per-lru lock. I.e. we don't want to skip that stuff > entirely.
Hm. Our pages don't participate in any LRU activity (including isolation+migration). Isolation+migration would only apply once we'd support page migration. But yes, secretmem also does it like that: filemap_add_folio() will call folio_add_lru(). Traditionally we used the unevictable LRU only for mlock purposes. But yeah, there are "unevictable" stats involved .... > > One thought I had, to avoid the IPIs that draining all per-CPU caches > requires, > was to disallow putting guest_memfd pages in folio batches, e.g. by hacking > something into folio_may_be_lru_cached(). But due to taking a per-lru lock, > that would penalize the relatively hot path and definitely common operation of > faulting in guest memory. On the other hand, memory conversion is already a > relatively slow operation and is relatively uncommon compared to page faults, > (and likely very uncommon for real world setups). I.e. having to drain all > caches if conversion isn't safe penalizes a relatively slow, relatively > uncommon > path. Yeah, the lru_add_drain_all is rather messy. We have similar code in collect_longterm_unpinnable_folios(), where we first try a lru_add_drain(), to then escalate to a lru_add_drain_all(). Maybe we could factor that (suboptimal code) out to not have to reinvent the same thing multiple times? -- Cheers, David
