On Fri, Mar 21, 2008 at 04:56:50PM +0100, Andrea Arcangeli wrote: > On Fri, Mar 21, 2008 at 10:37:00AM -0300, Marcelo Tosatti wrote: > > This is not the final put_page(). > > > > Remote TLB's are flushed here, after rmap_remove: > > > > + if (nuked) > > + kvm_flush_remote_tlbs(kvm); > > > > This ioctl is called before zap_page_range() is executed through > > sys_madvise(MADV_DONTNEED) to remove the page in question. > > > > We know that the guest will not attempt to fault in the gfn because > > the virtio balloon driver is synchronous (it will only attempt to > > release that page back to the guest OS once rmap_nuke+zap_page_range has > > finished). > > > > Can you be more verbose? > > Sure. > > 1) even if you run madvise(MADV_DONTNEED) after KVM_ZAP_GFN, the anon > page can be released by the VM at any time without any kvm-aware > lock (there will be a swap reference to it, no any more page_count > references leading to memory corruption in the host in presence of > memory pressure). This is purely theoretical of course, not sure if > timings or probabilities allows for reproducing this in real life.
If there are any active shadow mappings to a page there is a guarantee that there is a valid linux pte mapping pointing at it. So page_count == 1 + nr_sptes. So the theoretical race you're talking about is: CPU0 CPU1 spte = rmap_next(kvm, rmapp, NULL); while (spte) { BUG_ON(!spte); BUG_ON(!(*spte & PT_PRESENT_MASK)); rmap_printk("rmap_nuke: spte %p %llx\n", spte, *spte); rmap_remove(kvm, spte); set_shadow_pte(spte, shadow_trap_nonpresent_pte); nuked = 1; spte = rmap_next(kvm, rmapp, spte); } <---------- try_to_unmap_one() page is now free page allocated for other purposes if (nuked) kvm_flush_remote_tlbs(kvm); And some other VCPU with the TLB cached writes to the now freed (and possibly allocated to another purpose) page. This case is safe because the path that frees a pte and subsequently a page will take care of flushing the TLB of any remote CPU's that possibly have it cached (before freeing the page, of course). ptep_clear_flush->flush_tlb_page. Am I missing something? > 2) not sure what you mean with synchronous, do you mean single > threaded? I can't see how it can be single threaded (does > ballooning stops all other vcpus?). No, I mean synchronous as in that no other vcpu will attempt to fault that _particular gfn_ in between KVM_ZAP_GFN and madvise. > Why are you taking the mmu_lock around rmap_nuke if no other vcpu > can take any page fault and call into get_user_pages in between > KVM_ZAP_GFN and madvise? Other vcpu's can take page faults and call into get_user_pages, but not for the gfn KVM_ZAP_GFN is operating on, because it has been allocated by the balloon driver. So we need mmu_lock to protect against concurrent shadow page and rmap operations. > As far as I > can tell the only possible safe ordering is madvise; KVM_ZAP_GFN, > which is emulating the mmu notifier behavior incidentally. > > Note that the rmap_remove smp race (also note here smp race means > smp-host race, it will trigger even if guest is UP) might be a generic > issue with the rmap_remove logic. I didn't analyze all the possible > rmap_remove callers yet (this was in my todo list), I just made sure > that my code would be smp safe. As detailed above, we have a guarantee that there is a live linux pte by the time rmap_remove() nukes a shadow pte. > > By the way, I don't see invalidate_begin/invalidate_end hooks in the KVM > > part of MMU notifiers V9 patch? (meaning that zap_page_range will not zap > > the spte's for the pages in question). > > range_begin isn't needed. range_begin is needed only by secondary mmu > drivers that aren't reference counting the pages. The _end callback is > below. It could be improved to skip the whole range in a single browse > of the memslots instead of browsing it for each page in the range. The > mmu notifiers aren't merged and this code may still require changes in > terms of API if EMM is merged instead of #v9 (hope not), so I tried to > keep it simple. Oh, I missed that. Nice. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel