On Tue, Apr 24, 2012 at 12:47:25PM +0300, Avi Kivity wrote:
> Using RCU for lockless shadow walking can increase the amount of memory
> in use by the system, since RCU grace periods are unpredictable.  We also
> have an unconditional write to a shared variable (reader_counter), which
> isn't good for scaling.
> 
> Replace that with a scheme similar to x86's get_user_pages_fast(): disable
> interrupts during lockless shadow walk to force the freer
> (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
> processor with interrupts enabled.
> 
> We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
> kvm_flush_remote_tlbs() from avoiding the IPI.
> 
> Signed-off-by: Avi Kivity <[email protected]>
> ---
>  arch/x86/include/asm/kvm_host.h |    4 ---
>  arch/x86/kvm/mmu.c              |   72 
> +++++++++++++++------------------------
>  include/linux/kvm_host.h        |    3 +-
>  3 files changed, 30 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f624ca7..67e66e6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -237,8 +237,6 @@ struct kvm_mmu_page {
>  #endif
>  
>       int write_flooding_count;
> -
> -     struct rcu_head rcu;
>  };
>  
>  struct kvm_pio_request {
> @@ -536,8 +534,6 @@ struct kvm_arch {
>       u64 hv_guest_os_id;
>       u64 hv_hypercall;
>  
> -     atomic_t reader_counter;
> -
>       #ifdef CONFIG_KVM_MMU_AUDIT
>       int audit_point;
>       #endif
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 07424cf..ef88034 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -551,19 +551,28 @@ static u64 mmu_spte_get_lockless(u64 *sptep)
>  
>  static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
>  {
> -     rcu_read_lock();
> -     atomic_inc(&vcpu->kvm->arch.reader_counter);
> -
> -     /* Increase the counter before walking shadow page table */
> -     smp_mb__after_atomic_inc();
> +     /*
> +      * Prevent page table teardown by making any free-er wait during
> +      * kvm_flush_remote_tlbs() IPI to all active vcpus.
> +      */
> +     local_irq_disable();
> +     vcpu->mode = READING_SHADOW_PAGE_TABLES;
> +     /*
> +      * wmb: advertise vcpu->mode change
> +      * rmb: make sure we see updated sptes
> +      */
> +     smp_mb();
>  }
>  
>  static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
>  {
> -     /* Decrease the counter after walking shadow page table finished */
> -     smp_mb__before_atomic_dec();
> -     atomic_dec(&vcpu->kvm->arch.reader_counter);
> -     rcu_read_unlock();
> +     /*
> +      * Make our reads and writes to shadow page tables globally visible
> +      * before leaving READING_SHADOW_PAGE_TABLES mode.
> +      */

This comment is misleading. Writes to shadow page tables must be
performed with locked instructions outside the mmu_lock.

> +     smp_mb();
> +     vcpu->mode = OUTSIDE_GUEST_MODE;

Don't you want 

vcpu->mode = OUTSIDE_GUEST_MODE;
smp_mb();

So that vcpu->mode update is globally visible before subsequent loads
execute?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to