Marcelo Tosatti wrote: > On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote: > >> Marcelo Tosatti wrote: >> >>> Forgot to copy you... Ideally all pte updates should be done via the >>> paravirt interface. >>> >>> >> Hm, are you sure? >> >> >>> +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep) >>> +{ >>> + pte_t pte = *ptep; >>> + clear_bit(bit, (unsigned long *)&pte.pte); >>> + set_pte(ptep, pte); >>> +} >>> >>> >> This is not a safe transformation. This will lose hardware A/D bit >> changes if the pte is part of an active pagetable. Aside from that, >> clear_bit is atomic and so is the wrong thing to use on a local >> variable; a plain bitmask would do the job. >> > > True this is racy on baremetal. >
And under Xen (since the pagetable is the real hardware pagetable). >> clear_bit on a PTE is fine, since everyone has to support some level of >> trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to >> implement otherwise). Shadow pagetable implementations will do this >> naturally, and Xen must do it to allow atomic updates on ptes (also >> because it makes late-pinning/early-unpinning a performance win). >> > > Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching > pte updates via hypercall. > Yes, mprotect needs batching, but that probably needs special handling. > >>> + >>> static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, >>> pte_t *ptep, pte_t pte) >>> { >>> Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h >>> =================================================================== >>> --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h >>> +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h >>> @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t * >>> #define pte_update(mm, addr, ptep) do { } while (0) >>> #define pte_update_defer(mm, addr, ptep) do { } while (0) >>> >>> +#define pte_clear_bit(bit, ptep) clear_bit(bit, (unsigned long >>> *)&ptep->pte) >>> + >>> static inline void paravirt_pagetable_setup_start(pgd_t *base) >>> { >>> native_pagetable_setup_start(base); >>> @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str >>> ({ \ >>> int __changed = !pte_same(*(ptep), entry); \ >>> if (__changed && dirty) { \ >>> - *ptep = entry; \ >>> + set_pte(ptep, entry); \ >>> >>> >> This change is OK, but doesn't really do anything. All hypervisors >> allow direct access to ptes. >> > > We want the pte update to go through a hypercall instead of trap'n'emulate. > ptep_set_access_flags is only used in fault handling, so there's no scope for batching. > >>> pte_update_defer((vma)->vm_mm, (address), (ptep)); \ >>> flush_tlb_page(vma, address); \ >>> } \ >>> @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f >>> #define __HAVE_ARCH_PTEP_SET_WRPROTECT >>> static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long >>> addr, pte_t *ptep) >>> { >>> - clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); >>> + pte_clear_bit(_PAGE_BIT_RW, ptep); >>> >>> >> This is only valid if ptep_set_wrprotect is never used on active ptes, >> which I don't think is the case. >> > > Ok, pte_clear_bit() like above is broken. But the point is that we > want to batch the write protection of the source pte along with the > creation of the write protected destination pte on COW using the > paravirt_enter_lazy/paravirt_leave_lazy. > Which "we" are we talking about here? In Xen, the destination pte is unpinned during a fork(), so a plain memory write will work without any hypervisor interaction. That's what I mean about the "late pin/early unpin" optimisation. Almost all fork/exit pagetable manipulation happens on unpinned pagetables which are plain RW pages. It's only updates on pinned pagetables which need hypervisor interaction. The only significant batchable update is mprotect. I don't know what the issues for KVM/VMI are. J ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel