On Wed, Apr 11, 2012 at 08:22:07PM +0900, Takuya Yoshikawa wrote:
> I am now testing the following patch.
> 
> Note: this technique is used in several subsystems, e.g. jbd.
> 
> Although people tend to say that holding mmu_lock during get_dirty is
> always a problem, my impression is slightly different.

Other than potential performance improvement, the worst case scenario
of holding mmu_lock for hundreds of milliseconds at the beginning 
of migration of huge guests must be fixed.

> When we call get_dirty, most of hot memory pages have already been
> written at least once and faults are becoming rare.
> 
> Actually I rarely saw rescheduling due to mmu_lock contention when
> I tested this patch locally -- though not enough.
> 
> In contrast, if we do O(1), we need to write protect 511 pages soon
> after the get_dirty and the chance of mmu_lock contention may increase
> if multiple VCPUs try to write to memory.
> 
> Anyway, this patch is small and seems effective.
> 
>       Takuya
> 
> ===
> From: Takuya Yoshikawa <[email protected]>
> 
> get_dirty_log() needs to hold mmu_lock during write protecting dirty
> pages and this can be long when there are many dirty pages to protect.
> 
> As the guest can get faulted during that time, this may result in a
> severe latency problem which would prevent the system to scale.
> 
> This patch mitigates this by checking mmu_lock contention for every 2K
> dirty pages we protect: we have selected this value since it took about
> 100us to get 2K dirty pages.
> 
> TODO: more numbers.
> 
> Signed-off-by: Takuya Yoshikawa <[email protected]>
> ---
>  arch/x86/include/asm/kvm_host.h |    6 +++---
>  arch/x86/kvm/mmu.c              |   12 +++++++++---
>  arch/x86/kvm/x86.c              |   18 +++++++++++++-----
>  3 files changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f624ca7..26b39c1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -712,9 +712,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 
> accessed_mask,
>  
>  int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>  void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> -                                  struct kvm_memory_slot *slot,
> -                                  gfn_t gfn_offset, unsigned long mask);
> +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +                                 struct kvm_memory_slot *slot,
> +                                 gfn_t gfn_offset, unsigned long mask);
>  void kvm_mmu_zap_all(struct kvm *kvm);
>  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
> kvm_nr_mmu_pages);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 29ad6f9..b88c5cc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, 
> unsigned long *rmapp, int level
>   *
>   * Used when we do not need to care about huge page mappings: e.g. during 
> dirty
>   * logging we do not have any such mappings.
> + *
> + * Returns the number of pages protected by this.
>   */
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> -                                  struct kvm_memory_slot *slot,
> -                                  gfn_t gfn_offset, unsigned long mask)
> +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +                                 struct kvm_memory_slot *slot,
> +                                 gfn_t gfn_offset, unsigned long mask)
>  {
>       unsigned long *rmapp;
> +     int nr_protected = 0;
>  
>       while (mask) {
>               rmapp = &slot->rmap[gfn_offset + __ffs(mask)];
>               __rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL);
> +             ++nr_protected;
>  
>               /* clear the first set bit */
>               mask &= mask - 1;
>       }
> +
> +     return nr_protected;
>  }
>  
>  static int rmap_write_protect(struct kvm *kvm, u64 gfn)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0d9a578..b636669 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
> kvm_dirty_log *log)
>       unsigned long n, i;
>       unsigned long *dirty_bitmap;
>       unsigned long *dirty_bitmap_buffer;
> -     bool is_dirty = false;
> +     int nr_protected = 0;
>  
>       mutex_lock(&kvm->slots_lock);
>  
> @@ -3121,15 +3121,23 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, 
> struct kvm_dirty_log *log)
>               if (!dirty_bitmap[i])
>                       continue;
>  
> -             is_dirty = true;
> -
>               mask = xchg(&dirty_bitmap[i], 0);
>               dirty_bitmap_buffer[i] = mask;
>  
>               offset = i * BITS_PER_LONG;
> -             kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> +             nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot,
> +                                                             offset, mask);
> +             if (nr_protected > 2048) {

Can you expand on the reasoning behind this?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to