On 12/2/25 12:50, Nikita Kalyazin wrote:


On 01/12/2025 20:57, Peter Xu wrote:
On Mon, Dec 01, 2025 at 08:12:38PM +0000, Nikita Kalyazin wrote:


On 01/12/2025 18:35, Peter Xu wrote:
On Mon, Dec 01, 2025 at 04:48:22PM +0000, Nikita Kalyazin wrote:
I believe I found the precise point where we convinced ourselves that minor
support was sufficient: [1].  If at this moment we don't find that reasoning
valid anymore, then indeed implementing missing is the only option.

[1] https://lore.kernel.org/kvm/[email protected]

Now after I re-read the discussion, I may have made a wrong statement
there, sorry.  I could have got slightly confused on when the write()
syscall can be involved.

I agree if you want to get an event when cache missed with the current uffd
definitions and when pre-population is forbidden, then MISSING trap is
required.  That is, with/without the need of UFFDIO_COPY being available.

Do I understand it right that UFFDIO_COPY is not allowed in your case, but
only write()?

No, UFFDIO_COPY would work perfectly fine.  We will still use write()
whenever we resolve stage-2 faults as they aren't visible to UFFD.  When a
userfault occurs at an offset that already has a page in the cache, we will
have to keep using UFFDIO_CONTINUE so it looks like both will be required:

   - user mapping major fault -> UFFDIO_COPY (fills the cache and sets up
userspace PT)
   - user mapping minor fault -> UFFDIO_CONTINUE (only sets up userspace PT)
   - stage-2 fault -> write() (only fills the cache)

Is stage-2 fault about KVM_MEMORY_EXIT_FLAG_USERFAULT, per James's series?

Yes, that's the one ([1]).

[1]
https://lore.kernel.org/kvm/[email protected]


It looks fine indeed, but it looks slightly weird then, as you'll have two
ways to populate the page cache.  Logically here atomicity is indeed not
needed when you trap both MISSING + MINOR.

I reran the test based on the UFFDIO_COPY prototype I had using your
series [2], and UFFDIO_COPY is slower than write() to populate 512 MiB:
237 vs 202 ms (+17%).  Even though UFFDIO_COPY alone is functionally
sufficient, I would prefer to have an option to use write() where
possible and only falling back to UFFDIO_COPY for userspace faults to
have better performance.

Just so I understand correctly: we could even do without UFFDIO_COPY for that scenario by using write() + minor faults?

But what you are saying is that there might be a performance benefit in using UFFDIO_COPY for userspace faults, to avoid the write()+minor fault overhead?

--
Cheers

David

Reply via email to