"Vlastimil Babka (SUSE)" <[email protected]> writes: > On 5/23/26 02:17, Ackerley Tng via B4 Relay wrote: >> From: Ackerley Tng <[email protected]> >> >> When converting memory to private in guest_memfd, it is necessary to ensure >> that the pages are not currently being accessed by any other part of the >> kernel or userspace to avoid any current user writing to guest private >> memory. >> >> guest_memfd checks for unexpected refcounts to determine whether a page is >> still in use. The only expected refcounts after unmapping the range >> requested for conversion are those that are held by guest_memfd itself. > > Is it sufficient to only check, and not also freeze the refcount? (i.e. > using folio_ref_freeze()), because without freezing, anything (e.g. > compaction's pfn-based scanner) could do a speculative folio_try_get() and > the checked refcount becomes stale. >
I believe there's no issue here, since the main thing here is to check for long-term pins on the folio. Perhaps David can help me verify. :) > Might be ok if we know that no such speculative increment can result in > actually touching the page contents, and the extra refcount and something > inspecting the struct folio won't interfere with anything else. Then it > could be just a comment mentioning why it's safe. > In this series guest_memfd doesn't change anything in folio metadata, guest_memfd only updates the attributes tracked in the guest_memfd inode, and updates the RMP table for SNP. With the upcoming huge page support, guest_memfd needs to split/merge the folio, which means updates to folio metadata. That will need a closer look. I haven't added the comment, mostly because it's a long weekend here and I'd like to get Sashiko to run on it over the weekend. We should definitely continue this discussion on v8! > IIRC the compaction's scanning can result in a migration here so it's > probably ok? > Migration isn't supported for guest_memfd yet, so I think that's ok. >> Update the kvm_memory_attributes2 structure to include an error_offset >> field. This allows KVM to report the exact offset where a conversion >> failed to userspace. If the safety check fails, return -EAGAIN and copy >> the error_offset back to userspace so that it can potentially retry the >> operation or handle the failure gracefully. >> >> Suggested-by: David Hildenbrand <[email protected]> >> Co-developed-by: Vishal Annapurve <[email protected]> >> Signed-off-by: Vishal Annapurve <[email protected]> >> Reviewed-by: Fuad Tabba <[email protected]> >> Signed-off-by: Ackerley Tng <[email protected]> >> >> [...snip...] >>
