On Fri, Mar 21, 2008 at 04:56:50PM +0100, Andrea Arcangeli wrote:
> On Fri, Mar 21, 2008 at 10:37:00AM -0300, Marcelo Tosatti wrote:
> > This is not the final put_page().
> >
> > Remote TLB's are flushed here, after rmap_remove:
> >
> > + if (nuked)
> > + kvm_flush_remote_tlbs(kvm);
> >
> > This ioctl is called before zap_page_range() is executed through
> > sys_madvise(MADV_DONTNEED) to remove the page in question.
> >
> > We know that the guest will not attempt to fault in the gfn because
> > the virtio balloon driver is synchronous (it will only attempt to
> > release that page back to the guest OS once rmap_nuke+zap_page_range has
> > finished).
> >
> > Can you be more verbose?
>
> Sure.
>
> 1) even if you run madvise(MADV_DONTNEED) after KVM_ZAP_GFN, the anon
> page can be released by the VM at any time without any kvm-aware
> lock (there will be a swap reference to it, no any more page_count
> references leading to memory corruption in the host in presence of
> memory pressure). This is purely theoretical of course, not sure if
> timings or probabilities allows for reproducing this in real life.
If there are any active shadow mappings to a page there is a guarantee
that there is a valid linux pte mapping pointing at it. So page_count ==
1 + nr_sptes.
So the theoretical race you're talking about is:
CPU0 CPU1
spte = rmap_next(kvm, rmapp, NULL);
while (spte) {
BUG_ON(!spte);
BUG_ON(!(*spte & PT_PRESENT_MASK));
rmap_printk("rmap_nuke: spte %p %llx\n", spte, *spte);
rmap_remove(kvm, spte);
set_shadow_pte(spte, shadow_trap_nonpresent_pte);
nuked = 1;
spte = rmap_next(kvm, rmapp, spte);
}
<----------
try_to_unmap_one()
page is now
free
page
allocated for other
purposes
if (nuked)
kvm_flush_remote_tlbs(kvm);
And some other VCPU with the TLB cached writes to the now freed (and
possibly allocated to another purpose) page.
This case is safe because the path that frees a pte and subsequently
a page will take care of flushing the TLB of any remote CPU's that
possibly have it cached (before freeing the page, of course).
ptep_clear_flush->flush_tlb_page.
Am I missing something?
> 2) not sure what you mean with synchronous, do you mean single
> threaded? I can't see how it can be single threaded (does
> ballooning stops all other vcpus?).
No, I mean synchronous as in that no other vcpu will attempt to fault
that _particular gfn_ in between KVM_ZAP_GFN and madvise.
> Why are you taking the mmu_lock around rmap_nuke if no other vcpu
> can take any page fault and call into get_user_pages in between
> KVM_ZAP_GFN and madvise?
Other vcpu's can take page faults and call into get_user_pages, but not
for the gfn KVM_ZAP_GFN is operating on, because it has been allocated
by the balloon driver.
So we need mmu_lock to protect against concurrent shadow page and rmap
operations.
> As far as I
> can tell the only possible safe ordering is madvise; KVM_ZAP_GFN,
> which is emulating the mmu notifier behavior incidentally.
>
> Note that the rmap_remove smp race (also note here smp race means
> smp-host race, it will trigger even if guest is UP) might be a generic
> issue with the rmap_remove logic. I didn't analyze all the possible
> rmap_remove callers yet (this was in my todo list), I just made sure
> that my code would be smp safe.
As detailed above, we have a guarantee that there is a live linux pte
by the time rmap_remove() nukes a shadow pte.
> > By the way, I don't see invalidate_begin/invalidate_end hooks in the KVM
> > part of MMU notifiers V9 patch? (meaning that zap_page_range will not zap
> > the spte's for the pages in question).
>
> range_begin isn't needed. range_begin is needed only by secondary mmu
> drivers that aren't reference counting the pages. The _end callback is
> below. It could be improved to skip the whole range in a single browse
> of the memslots instead of browsing it for each page in the range. The
> mmu notifiers aren't merged and this code may still require changes in
> terms of API if EMM is merged instead of #v9 (hope not), so I tried to
> keep it simple.
Oh, I missed that. Nice.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel