On Wed, 2013-08-07 at 15:33 +0530, Bharat Bhushan wrote:
> When the MM code is invalidating a range of pages, it calls the KVM
> kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
> kvm_unmap_hva_range(), which arranges to flush all the TLBs for guest pages.
> However, the Linux PTEs for the range being flushed are still valid at
> that point.  We are not supposed to establish any new references to pages
> in the range until the ...range_end() notifier gets called.
> The PPC-specific KVM code doesn't get any explicit notification of that;
> instead, we are supposed to use mmu_notifier_retry() to test whether we
> are or have been inside a range flush notifier pair while we have been
> referencing a page.
> 
> This patch calls the mmu_notifier_retry() while mapping the guest
> page to ensure we are not referencing a page when in range invalidation.
> 
> This call is inside a region locked with kvm->mmu_lock, which is the
> same lock that is called by the KVM MMU notifier functions, thus
> ensuring that no new notification can proceed while we are in the
> locked region.
> 
> Signed-off-by: Bharat Bhushan <bharat.bhus...@freescale.com>
> ---
>  arch/powerpc/kvm/e500_mmu_host.c |   19 +++++++++++++++++--
>  1 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
> b/arch/powerpc/kvm/e500_mmu_host.c
> index ff6dd66..ae4eaf6 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -329,8 +329,14 @@ static inline int kvmppc_e500_shadow_map(struct 
> kvmppc_vcpu_e500 *vcpu_e500,
>       int tsize = BOOK3E_PAGESZ_4K;
>       unsigned long tsize_pages = 0;
>       pte_t *ptep;
> -     int wimg = 0;
> +     int wimg = 0, ret = 0;
>       pgd_t *pgdir;
> +     unsigned long mmu_seq;
> +     struct kvm *kvm = vcpu_e500->vcpu.kvm;
> +
> +     /* used to check for invalidations in progress */
> +     mmu_seq = kvm->mmu_notifier_seq;
> +     smp_rmb();
>  
>       /*
>        * Translate guest physical to true physical, acquiring
> @@ -458,6 +464,13 @@ static inline int kvmppc_e500_shadow_map(struct 
> kvmppc_vcpu_e500 *vcpu_e500,
>                               (long)gfn, pfn);
>               return -EINVAL;
>       }
> +
> +     spin_lock(&kvm->mmu_lock);
> +     if (mmu_notifier_retry(kvm, mmu_seq)) {
> +             ret = -EAGAIN;
> +             goto out;
> +     }
> +
>       kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
>  
>       kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
> @@ -466,10 +479,12 @@ static inline int kvmppc_e500_shadow_map(struct 
> kvmppc_vcpu_e500 *vcpu_e500,
>       /* Clear i-cache for new pages */
>       kvmppc_mmu_flush_icache(pfn);
>  
> +out:
> +     spin_unlock(&kvm->mmu_lock);
>       /* Drop refcount on page, so that mmu notifiers can clear it */
>       kvm_release_pfn_clean(pfn);
>  
> -     return 0;
> +     return ret;
>  }

Acked-by: Scott Wood <scottw...@freescale.com> since it's currently the
standard KVM approach, though I'm not happy about the busy-waiting
aspect of it.  What if we preempted the thread responsible for
decrementing mmu_notifier_count?  What if we did so being a SCHED_FIFO
task of higher priority than the decrementing thread?

-Scott



--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to