Marcelo Tosatti wrote:
> On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> Forgot to copy you... Ideally all pte updates should be done via the
>>> paravirt interface.
>>>  
>>>       
>> Hm, are you sure?
>>
>>     
>>> +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
>>> +{
>>> +   pte_t pte = *ptep;
>>> +   clear_bit(bit, (unsigned long *)&pte.pte);
>>> +   set_pte(ptep, pte);
>>> +}
>>>  
>>>       
>> This is not a safe transformation.  This will lose hardware A/D bit 
>> changes if the pte is part of an active pagetable.  Aside from that, 
>> clear_bit is atomic and so is the wrong thing to use on a local 
>> variable; a plain bitmask would do the job.
>>     
>
> True this is racy on baremetal.
>   

And under Xen (since the pagetable is the real hardware pagetable).

>> clear_bit on a PTE is fine, since everyone has to support some level of 
>> trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to 
>> implement otherwise).  Shadow pagetable implementations will do this 
>> naturally, and Xen must do it to allow atomic updates on ptes (also 
>> because it makes late-pinning/early-unpinning a performance win).
>>     
>
> Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching
> pte updates via hypercall.
>   

Yes, mprotect needs batching, but that probably needs special handling.

>   
>>> +
>>> static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>>                           pte_t *ptep, pte_t pte)
>>> {
>>> Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> ===================================================================
>>> --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h
>>> +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t *
>>> #define pte_update(mm, addr, ptep)              do { } while (0)
>>> #define pte_update_defer(mm, addr, ptep)        do { } while (0)
>>>
>>> +#define pte_clear_bit(bit, ptep)   clear_bit(bit, (unsigned long 
>>> *)&ptep->pte)
>>> +
>>> static inline void paravirt_pagetable_setup_start(pgd_t *base)
>>> {
>>>     native_pagetable_setup_start(base);
>>> @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str
>>> ({                                                                  \
>>>     int __changed = !pte_same(*(ptep), entry);                      \
>>>     if (__changed && dirty) {                                       \
>>> -           *ptep = entry;                                          \
>>> +           set_pte(ptep, entry);                                   \
>>>  
>>>       
>> This change is OK, but doesn't really do anything.  All hypervisors 
>> allow direct access to ptes.
>>     
>
> We want the pte update to go through a hypercall instead of trap'n'emulate.
>   

ptep_set_access_flags is only used in fault handling, so there's no 
scope for batching.

>   
>>>             pte_update_defer((vma)->vm_mm, (address), (ptep));      \
>>>             flush_tlb_page(vma, address);                           \
>>>     }                                                               \
>>> @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f
>>> #define __HAVE_ARCH_PTEP_SET_WRPROTECT
>>> static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
>>> addr, pte_t *ptep)
>>> {
>>> -   clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
>>> +   pte_clear_bit(_PAGE_BIT_RW, ptep);
>>>  
>>>       
>> This is only valid if ptep_set_wrprotect is never used on active ptes, 
>> which I don't think is the case.
>>     
>
> Ok, pte_clear_bit() like above is broken. But the point is that we
> want to batch the write protection of the source pte along with the
> creation of the write protected destination pte on COW using the
> paravirt_enter_lazy/paravirt_leave_lazy.
>   

Which "we" are we talking about here?  In Xen, the destination pte is 
unpinned during a fork(), so a plain memory write will work without any 
hypervisor interaction.  That's what I mean about the "late pin/early 
unpin" optimisation.  Almost all fork/exit pagetable manipulation 
happens on unpinned pagetables which are plain RW pages.  It's only 
updates on pinned pagetables which need hypervisor interaction.  The 
only significant batchable update is mprotect.

I don't know what the issues for KVM/VMI are.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to