Fuad Tabba <[email protected]> writes: > > [...snip...] > >> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn) >> +{ >> + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); >> + struct inode *inode; >> + >> + /* >> + * If this gfn has no associated memslot, there's no chance of the >> gfn >> + * being backed by private memory, since guest_memfd must be used for >> + * private memory, and guest_memfd must be associated with some >> memslot. >> + */ >> + if (!slot) >> + return 0; >> + >> + CLASS(gmem_get_file, file)(slot); >> + if (!file) >> + return 0; >> + >> + inode = file_inode(file); >> + >> + /* >> + * Rely on the maple tree's internal RCU lock to ensure a >> + * stable result. This result can become stale as soon as the >> + * lock is dropped, so the caller _must_ still protect >> + * consumption of private vs. shared by checking >> + * mmu_invalidate_retry_gfn() under mmu_lock to serialize >> + * against ongoing attribute updates. >> + */ >> + return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn)); >> +} > > Doesn't this imply that all consumers of kvm_mem_is_private() should > validate the result using mmu_lock and the invalidation sequence?
Let me know how I can improve the comment. I think the "consumption" of private vs shared here actually means something like "don't commit a page being faulted into page tables based on the result of kvm_gmem_get_memory_attributes() without checking kvm->mmu_invalidate_in_progress.", since a racing conversion may complete before you commit. kvm_mem_is_private() is used from these places: 1. Fault handling in KVM, like page_fault_can_be_fast(), kvm_mmu_faultin_pfn(), kvm_mmu_page_fault(): this already handles the entire mmu_lock and invalidation dance. No fault will be committed if a racing conversion happened after kvm_mem_is_private() but before the commit. 2. kvm_mmu_max_mapping_level() from recovering huge pages after disabling dirty logging: Other than that it can't be used with guest_memfd now since dirty logging can't be used with guest_memfd and guest_memfd memslots are not updatable, this holds mmu_lock throughout until the huge page recovery is done. invalidate_begin also involves zapping the pages in the range, so if the order of events is | Thread A | Thread B | |------------------------------|-------------------| | invalidate_begin + zap | | | update attributes maple_tree | recover huge page | | invalidate_end | | Then recovering will never see the zapped pages, nothing to recover, no kvm_mem_is_private() lookup. 3. kvm_arch_vcpu_pre_fault_memory() This eventually calls kvm_tdp_mmu_page_fault(), which checks is_page_fault_stale(), so it does check before committing. Were there any other calls I missed? > sev_handle_rmp_fault() calls kvm_mem_is_private() without holding > mmu_lock and without any retry mechanism. Is that a problem? > Sean already replied on your actual question separately :) > Cheers, > /fuad > > >> >> [...snip...] >>
