Hi Avi, Good that you're back.
On Sun, Mar 16, 2008 at 04:00:06PM +0200, Avi Kivity wrote: > Marcelo Tosatti wrote: > >This patchset resends Anthony's QEMU balloon support plus: > > > >- Truncates the target size to ram size > >- Enables madvise() conditioned on KVM_ZAP_GFN ioctl > > > > > > Once mmu notifiers are in, KVM_ZAP_GFN isn't needed. So we have three > possible situations: > > - zap needed, but not available: don't madvise() > - zap needed and available: zap and madvise() > - zap unneeded: madvise() Right. That is what the patch does. You just have to fill in "have_mmu_notifiers" here: +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn) +{ + unsigned long addr; + int have_mmu_notifiers = 0; + + down_read(&kvm->slots_lock); + addr = gfn_to_hva(kvm, gfn); + + if (kvm_is_error_hva(addr)) { + up_read(&kvm->slots_lock); + return -EINVAL; + } + + if (!have_mmu_notifiers) { + spin_lock(&kvm->mmu_lock); + rmap_nuke(kvm, gfn); + spin_unlock(&kvm->mmu_lock); + } + up_read(&kvm->slots_lock); So as to avoid rmap_nuke, since that will be done through the madvise() path. > Did you find out what's causing the errors in the first place (if zap is > not used)? It worries me greatly. Yes, the problem is that the rmap code does not handle the qemu process mappings from vanishing while there is a present rmap. If that happens, and there is a fault for a gfn whose qemu mapping has been removed, a different physical zero page will be allocated: rmap a -> gfn 0 -> physical host page 0 mapping for gfn 0 gets removed guest faults in gfn 0 through the same pte "chain" rmap a -> gfn 0 -> physical host page 1 When instantiating the shadow mapping for the second time, the "is_rmap_pte" check succeeds, so we release the reference grabbed by gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing to a physical page without having an additional reference on that page. The following makes the host not crash under such a condition, but the condition itself is invalid leading to inconsistent state on the guest. So IMHO it shouldnt be allowed to happen in the first place. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f0cdfba..4c93b79 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1009,6 +1009,21 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t +gva) return page; } +static int was_spte_rmapped(struct kvm *kvm, u64 *spte, struct page *page) +{ + int ret = 0; + unsigned long host_pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; + + if (is_rmap_pte(*spte)) { + if (host_pfn != page_to_pfn(page)) + rmap_remove(kvm, spte); + else + ret = 1; + } + + return ret; +} + static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, unsigned pt_access, unsigned pte_access, int user_fault, int write_fault, int dirty, @@ -1016,7 +1031,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 +*shadow_pte, struct page *page) { u64 spte; - int was_rmapped = is_rmap_pte(*shadow_pte); + int was_rmapped = was_spte_rmapped(vcpu->kvm, shadow_pte, page); int was_writeble = is_writeble_pte(*shadow_pte); /* ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel