On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote:
> Marcelo Tosatti wrote:
> >Forgot to copy you... Ideally all pte updates should be done via the
> >paravirt interface.
> >  
> 
> Hm, are you sure?
> 
> >+static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
> >+{
> >+    pte_t pte = *ptep;
> >+    clear_bit(bit, (unsigned long *)&pte.pte);
> >+    set_pte(ptep, pte);
> >+}
> >  
> 
> This is not a safe transformation.  This will lose hardware A/D bit 
> changes if the pte is part of an active pagetable.  Aside from that, 
> clear_bit is atomic and so is the wrong thing to use on a local 
> variable; a plain bitmask would do the job.

True this is racy on baremetal.

> clear_bit on a PTE is fine, since everyone has to support some level of 
> trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to 
> implement otherwise).  Shadow pagetable implementations will do this 
> naturally, and Xen must do it to allow atomic updates on ptes (also 
> because it makes late-pinning/early-unpinning a performance win).

Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching
pte updates via hypercall.

> >+
> > static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >                           pte_t *ptep, pte_t pte)
> > {
> >Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h
> >===================================================================
> >--- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h
> >+++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h
> >@@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t *
> > #define pte_update(mm, addr, ptep)              do { } while (0)
> > #define pte_update_defer(mm, addr, ptep)        do { } while (0)
> > 
> >+#define pte_clear_bit(bit, ptep)    clear_bit(bit, (unsigned long 
> >*)&ptep->pte)
> >+
> > static inline void paravirt_pagetable_setup_start(pgd_t *base)
> > {
> >     native_pagetable_setup_start(base);
> >@@ -302,7 +304,7 @@ static inline void native_set_pte_at(str
> > ({                                                                  \
> >     int __changed = !pte_same(*(ptep), entry);                      \
> >     if (__changed && dirty) {                                       \
> >-            *ptep = entry;                                          \
> >+            set_pte(ptep, entry);                                   \
> >  
> 
> This change is OK, but doesn't really do anything.  All hypervisors 
> allow direct access to ptes.

We want the pte update to go through a hypercall instead of trap'n'emulate.

> 
> >             pte_update_defer((vma)->vm_mm, (address), (ptep));      \
> >             flush_tlb_page(vma, address);                           \
> >     }                                                               \
> >@@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f
> > #define __HAVE_ARCH_PTEP_SET_WRPROTECT
> > static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
> > addr, pte_t *ptep)
> > {
> >-    clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> >+    pte_clear_bit(_PAGE_BIT_RW, ptep);
> >  
> 
> This is only valid if ptep_set_wrprotect is never used on active ptes, 
> which I don't think is the case.

Ok, pte_clear_bit() like above is broken. But the point is that we
want to batch the write protection of the source pte along with the
creation of the write protected destination pte on COW using the
paravirt_enter_lazy/paravirt_leave_lazy.

So a pv_mmu_ops->pte_clear_bit() which does native clear_bit() on
baremetal and a hypercall based variant on KVM (or Xen) seems an
appropriate thing to do. I'll come up with a decent patch.

Thanks


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to