On Tue, Jan 22, 2008 at 04:53:37PM +0200, Avi Kivity wrote:
> Andrea Arcangeli wrote:
>> On Tue, Jan 22, 2008 at 04:08:16PM +0200, Avi Kivity wrote:
>>   
>>> Andrea Arcangeli wrote:
>>>     
>>>> This is the same as before but it uses the age_page callback to
>>>> prevent the guest OS working set to be swapped out. It works well here
>>>> so far. This depends on the memslot locking with mmu lock patch and on
>>>> the mmu notifiers #v3 patch that I'll post in CC with linux-mm shortly
>>>> that implements the age_page callback and that changes follow_page to
>>>> set the young bit in the pte instead of setting the referenced bit (so
>>>> the age_page will be called again later when the VM clears the young
>>>> bit).
>>>>
>>>>  +static void unmap_spte(struct kvm *kvm, u64 *spte)
>>>> +{
>>>> +  struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> 
>>>> PAGE_SHIFT);
>>>> +  get_page(page);
>>>> +  rmap_remove(kvm, spte);
>>>> +  set_shadow_pte(spte, shadow_trap_nonpresent_pte);
>>>> +  kvm_flush_remote_tlbs(kvm);
>>>> +  __free_page(page);
>>>> +}
>>>>         
>>> Why is get_page()/__free_page() needed here? Isn't kvm_release_page_*() 
>>> sufficient?
>>>     
>>
>> The other-cpus-tlb have to be flushed _before_ the page is visible in
>> the host kernel freelist, otherwise other host-cpus with tlbs still
>> mapping the page with write-access would be able to modify the page
>> even after it's queued in the freelist. 
>
> Right.  But doesn't this apply to other callers of rmap_remove()?  Perhaps 
> we need to put the flush in set_spte() or rmap_remove() and 
> rmap_write_protect().
>
> Oh, rmap_write_protect() already has the flush.

rmap_write_protect is the only obviously safe one because it doesn't
decrease the reference count, it flushes the tlb only to flush any
write-enabled tlb entry.

The problem is only with all rmap_remove callers.

invalidate_page ironically I think is ok with flushing the tlb after
put_page because ptep_clear_flush is invoked with a pin on the page by
the caller of ptep_clear_flush.

invalidate_range is not ok with flushing the tlb _after_ put_page.

All other rmap_remove callers must take into account that when
rmap_remove returns, in between put_page and tlb-flush, another cpu
may be in the VM and free the page the moment after the pin on the
page is gone. This is especially true with readonly swapcache that
doesn't require swapout to be put in the freelist.

So yes, it may be a generic race for the rmap_remove callers.

I'm not exactly sure why I was getting crashes w/o doing
get_page/tlbflush/__free_page, the only logical explanation at this
point is invalidate_range.

> I'm afraid I don't really understand the difference in semantics between 
> put_page() and __free_page().  Maybe we need to switch kvm_release_page_*() 
> to __free_page()?

put_page/__free_page will work fine in practice for kvm, __free_page
is faster so yes, I think kvm_release_page_ should be changed to use
__free_page but this is a microoptimization only. The only real issue
is with the tlb flush in smp. If it can happen after
put_page/__free_page or not.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to