Re: [kvm-devel] KVM: MMU: add KVM_ZAP_GFN ioctl

Andrea Arcangeli Sun, 23 Mar 2008 23:54:38 -0700

On Fri, Mar 21, 2008 at 06:23:41PM -0300, Marcelo Tosatti wrote:
> If there are any active shadow mappings to a page there is a guarantee
> that there is a valid linux pte mapping pointing at it. So page_count ==
             ^^ was


> 1 + nr_sptes.

Yes.

> So the theoretical race you're talking about is:
> 
> 
>     CPU0                                                      CPU1
> 
>     spte = rmap_next(kvm, rmapp, NULL);
>     while (spte) {
>             BUG_ON(!spte);
>             BUG_ON(!(*spte & PT_PRESENT_MASK));
>             rmap_printk("rmap_nuke: spte %p %llx\n", spte, *spte);
>             rmap_remove(kvm, spte);
>             set_shadow_pte(spte, shadow_trap_nonpresent_pte);
>             nuked = 1;
>             spte = rmap_next(kvm, rmapp, spte);
>     }
>                                                       <---------- 
> try_to_unmap_one()
>                                                                   page is now 
> free
>                                                                   page 
> allocated for other
>                                                                   purposes
>    if (nuked)
>        kvm_flush_remote_tlbs(kvm);

I'd more accurately describe the race as this:


     CPU0                                                       CPU1
 
     spte = rmap_next(kvm, rmapp, NULL);
     while (spte) {
             BUG_ON(!spte);
             BUG_ON(!(*spte & PT_PRESENT_MASK));
             rmap_printk("rmap_nuke: spte %p %llx\n", spte, *spte);
                                                                    page_count 
= 2
                                                        <---------- 
try_to_unmap_one()
                                                                    page_count 
= 1
             rmap_remove(kvm, spte);   <- page is freed, race window opened
             set_shadow_pte(spte, shadow_trap_nonpresent_pte);
             nuked = 1;
             spte = rmap_next(kvm, rmapp, spte);
     }
    if (nuked)
        kvm_flush_remote_tlbs(kvm); <- race window closed

> And some other VCPU with the TLB cached writes to the now freed (and
> possibly allocated to another purpose) page.

The other VCPU could also read, not just write. If it reads the guest
will crash, if it writes the host will crash (host fs corruption
etc...) too.

> This case is safe because the path that frees a pte and subsequently
> a page will take care of flushing the TLB of any remote CPU's that
> possibly have it cached (before freeing the page, of course).
> ptep_clear_flush->flush_tlb_page.
> 
> Am I missing something?

flush_tlb_page only takes care of accesses to the page through qemu
task when outside vmx/svm mode. If the other physical cpu, is running
some code of some guest vcpu in vmx/svm mode after the race window is
open and before it is closed the race will trigger.

> > 2) not sure what you mean with synchronous, do you mean single
> >    threaded? I can't see how it can be single threaded (does
> >    ballooning stops all other vcpus?). 
> 
> No, I mean synchronous as in that no other vcpu will attempt to fault
> that _particular gfn_ in between KVM_ZAP_GFN and madvise.

The other vcpu can't fault any gfn, the other vcpu will fault only
_after_ the kvm_flush_remote_tlbs. Otherwise there would be no
problem.

The fix is exactly to make sure the other vcpu faults that particular
gfn before the page is in the freelist, and to do that the tlb flush
must happen _before_ rmap_remove calls put_page.

And madvise should happen before KVM_ZAP_GFN to prevent a gfn_to_page
in a parallel kvm page fault triggered by some other vcpu to
re-instantiate the spte after KVM_ZAP_GFN run.

> >  Why are you taking the mmu_lock around rmap_nuke if no other vcpu
> >  can take any page fault and call into get_user_pages in between
> >  KVM_ZAP_GFN and madvise?
> 
> Other vcpu's can take page faults and call into get_user_pages, but not
> for the gfn KVM_ZAP_GFN is operating on, because it has been allocated
> by the balloon driver.

With regard to the ordering of ioctl vs madvise, we better not trust
the guest. If the guest tries to access ram ranges that has been
inflated in the balloon qemu should behave like if the guest is
accessing ram outside the virtualized e820 map instead of leaking ram
to the buggy guest. There is no difference in the ordering, other than
doing madvise first is more robust in detecting buggy guest. I can't
see any benefit in doing the ioctl first. The memory corrupting race
window will happen regardless because the host VM rmap code is the one
releasing the page_count.

> So we need mmu_lock to protect against concurrent shadow page and rmap
> operations.

Ok.

> As detailed above, we have a guarantee that there is a live linux pte
> by the time rmap_remove() nukes a shadow pte.

I can't see how. I mean right now with current kvm.git, not with some
patch floating around. Which is the whole point of adding the
KVM_ZAP_GFN because the mainline folks didn't merge the mmu notifiers
in time for .25. My mmu notifier code already taken care of doing all
of this race free but I sure agree with this approach for the short
term, the less we have to wait the better.

Thanks!
Andrea

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] KVM: MMU: add KVM_ZAP_GFN ioctl

Reply via email to