On Fri, Mar 21, 2008 at 10:37:00AM -0300, Marcelo Tosatti wrote: > This is not the final put_page(). > > Remote TLB's are flushed here, after rmap_remove: > > + if (nuked) > + kvm_flush_remote_tlbs(kvm); > > This ioctl is called before zap_page_range() is executed through > sys_madvise(MADV_DONTNEED) to remove the page in question. > > We know that the guest will not attempt to fault in the gfn because > the virtio balloon driver is synchronous (it will only attempt to > release that page back to the guest OS once rmap_nuke+zap_page_range has > finished). > > Can you be more verbose?
Sure. 1) even if you run madvise(MADV_DONTNEED) after KVM_ZAP_GFN, the anon page can be released by the VM at any time without any kvm-aware lock (there will be a swap reference to it, no any more page_count references leading to memory corruption in the host in presence of memory pressure). This is purely theoretical of course, not sure if timings or probabilities allows for reproducing this in real life. 2) not sure what you mean with synchronous, do you mean single threaded? I can't see how it can be single threaded (does ballooning stops all other vcpus?). Why are you taking the mmu_lock around rmap_nuke if no other vcpu can take any page fault and call into get_user_pages in between KVM_ZAP_GFN and madvise? As far as I can tell the only possible safe ordering is madvise; KVM_ZAP_GFN, which is emulating the mmu notifier behavior incidentally. Note that the rmap_remove smp race (also note here smp race means smp-host race, it will trigger even if guest is UP) might be a generic issue with the rmap_remove logic. I didn't analyze all the possible rmap_remove callers yet (this was in my todo list), I just made sure that my code would be smp safe. > By the way, I don't see invalidate_begin/invalidate_end hooks in the KVM > part of MMU notifiers V9 patch? (meaning that zap_page_range will not zap > the spte's for the pages in question). range_begin isn't needed. range_begin is needed only by secondary mmu drivers that aren't reference counting the pages. The _end callback is below. It could be improved to skip the whole range in a single browse of the memslots instead of browsing it for each page in the range. The mmu notifiers aren't merged and this code may still require changes in terms of API if EMM is merged instead of #v9 (hope not), so I tried to keep it simple. +void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + BUG_ON(mm != kvm->mm); + write_seqlock(&kvm->arch.mmu_notifier_invalidate_lock); + kvm_unmap_hva(kvm, address); + write_sequnlock(&kvm->arch.mmu_notifier_invalidate_lock); +} + +void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + for (; start < end; start += PAGE_SIZE) + kvm_mmu_notifier_invalidate_page(mn, mm, start); +} + +int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + BUG_ON(mm != kvm->mm); + return kvm_age_hva(kvm, address); +} + +static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { + .invalidate_page = kvm_mmu_notifier_invalidate_page, + .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, + .clear_flush_young = kvm_mmu_notifier_clear_flush_young, +}; + struct kvm *kvm_arch_create_vm(void) { struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel