Re: [PATCH v3 2/4] live migration support for initial write protect of VM

2014-04-24 Thread Steve Capper
On Wed, Apr 23, 2014 at 12:18:07AM +0100, Mario Smarduch wrote:


 Support for live migration initial write protect.
 - moved write protect to architecture memory region prepare function. This
   way you can fail, abort migration without keep track of migration status.
 - Above also allows to generalize read dirty log function with x86
 - Added stage2_mark_pte_ro()
 - optimized initial write protect, skip upper table lookups
 - added stage2pmd_addr_end() to do generic 4 level table walk
 - changed kvm_flush_remote_tlbs() to weak function

Hello Mario,
I've taken a quick look at this and have a few suggestions below.
(I'm not a KVM expert, but took a look at the memory manipulation).

Future versions of this series could probably benefit from being sent
to lakml too?

Cheers,
--
Steve


 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm/include/asm/kvm_host.h |8 ++
  arch/arm/kvm/arm.c  |3 +
  arch/arm/kvm/mmu.c  |  163 
 +++
  virt/kvm/kvm_main.c |5 +-
  4 files changed, 178 insertions(+), 1 deletion(-)

 diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
 index 1e739f9..9f827c8 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -67,6 +67,12 @@ struct kvm_arch {

 /* Interrupt controller */
 struct vgic_distvgic;
 +
 +   /* Marks start of migration, used to handle 2nd stage page faults
 +* during migration, prevent installing huge pages and split huge 
 pages
 +* to small pages.
 +*/
 +   int migration_in_progress;
  };

  #define KVM_NR_MEM_OBJS 40
 @@ -230,4 +236,6 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, 
 u64 value);

  void kvm_tlb_flush_vmid(struct kvm *kvm);

 +int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
 +
  #endif /* __ARM_KVM_HOST_H__ */
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 9a4bc10..b916478 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -233,6 +233,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
enum kvm_mr_change change)
  {
 +   /* Request for migration issued by user, write protect memory slot */
 +   if ((change != KVM_MR_DELETE)  (mem-flags  
 KVM_MEM_LOG_DIRTY_PAGES))
 +   return kvm_mmu_slot_remove_write_access(kvm, mem-slot);
 return 0;
  }

 diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
 index 7ab77f3..4d029a6 100644
 --- a/arch/arm/kvm/mmu.c
 +++ b/arch/arm/kvm/mmu.c
 @@ -31,6 +31,11 @@

  #include trace.h

 +#define stage2pud_addr_end(addr, end)  \
 +({ u64 __boundary = ((addr) + PUD_SIZE)  PUD_MASK;\
 +   (__boundary - 1  (end) - 1) ? __boundary : (end);  \
 +})

A matter of personal preference: can this be a static inline function
instead? That way you could avoid ambiguity with the parameter types.
(not an issue here, but this has bitten me in the past).

 +
  extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];

  static pgd_t *boot_hyp_pgd;
 @@ -569,6 +574,15 @@ static int stage2_set_pte(struct kvm *kvm, struct 
 kvm_mmu_memory_cache *cache,
 return 0;
  }

 +/* Write protect page */
 +static void stage2_mark_pte_ro(pte_t *pte)
 +{
 +   pte_t new_pte;
 +
 +   new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2);
 +   *pte = new_pte;
 +}

This isn't making the pte read only.
It's nuking all the flags from the pte and replacing them with factory
settings. (In this case the PAGE_S2 pgprot).
If we had other attributes that we later wish to retain this could be
easily overlooked. Perhaps a new name for the function?

 +
  /**
   * kvm_phys_addr_ioremap - map a device range to guest IPA
   *
 @@ -649,6 +663,155 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
 phys_addr_t *ipap)
 return false;
  }

 +/**
 + * split_pmd - splits huge pages to small pages, required to keep a dirty 
 log of
 + *  smaller memory granules, otherwise huge pages would need to be
 + *  migrated. Practically an idle system has problems migrating with
 + *  huge pages.  Called during WP of entire VM address space, done
 + *  initially when  migration thread isses the KVM_MEM_LOG_DIRTY_PAGES
 + *  ioctl.
 + *  The mmu_lock is held during splitting.
 + *
 + * @kvm:The KVM pointer
 + * @pmd:Pmd to 2nd stage huge page
 + * @addr: ` Guest Physical Address
Nitpick: typo `

 + */
 +int split_pmd(struct kvm *kvm, pmd_t *pmd, u64 addr)

Maybe worth renaming to something like kvm_split_pmd to avoid future
namespace collisions (either compiler or cscope/ctags)? It should also
probably be static?

 +{
 +   struct page *page;
 +   pfn_t pfn = pmd_pfn(*pmd);
 +   pte_t *pte;
 +   int i;
 +
 +   page = alloc_page(GFP_KERNEL);
 +   

Re: [PATCH v3 2/4] live migration support for initial write protect of VM

2014-04-24 Thread Steve Capper
On Thu, Apr 24, 2014 at 05:39:29PM +0100, Steve Capper wrote:
[ ... ]

 
 -- IMPORTANT NOTICE: The contents of this email and any attachments are 
 confidential and may also be privileged. If you are not the intended 
 recipient, please notify the sender immediately and do not disclose the 
 contents to any other person, use it for any purpose, or store or copy the 
 information in any medium.  Thank you.
 
 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
 Registered in England  Wales, Company No:  2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
 Registered in England  Wales, Company No:  2548782
 

Please ignore this notice, apologies for it appearing.
I will learn how to configure email

Cheers,
-- 
Steve

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] live migration support for initial write protect of VM

2014-04-24 Thread Mario Smarduch
On 04/24/2014 09:39 AM, Steve Capper wrote:
 On Wed, Apr 23, 2014 at 12:18:07AM +0100, Mario Smarduch wrote:


 Support for live migration initial write protect.
 - moved write protect to architecture memory region prepare function. This
   way you can fail, abort migration without keep track of migration status.
 - Above also allows to generalize read dirty log function with x86
 - Added stage2_mark_pte_ro()
 - optimized initial write protect, skip upper table lookups
 - added stage2pmd_addr_end() to do generic 4 level table walk
 - changed kvm_flush_remote_tlbs() to weak function
 
 Hello Mario,
 I've taken a quick look at this and have a few suggestions below.
 (I'm not a KVM expert, but took a look at the memory manipulation).

Hi Steve,
your suggestions are very helpful, my response inline.

Thanks.
  Mario
 
 Future versions of this series could probably benefit from being sent
 to lakml too?
 
 Cheers,
 --
 Steve
 

 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm/include/asm/kvm_host.h |8 ++
  arch/arm/kvm/arm.c  |3 +
  arch/arm/kvm/mmu.c  |  163 
 +++
  virt/kvm/kvm_main.c |5 +-
  4 files changed, 178 insertions(+), 1 deletion(-)

 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index 1e739f9..9f827c8 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -67,6 +67,12 @@ struct kvm_arch {

 /* Interrupt controller */
 struct vgic_distvgic;
 +
 +   /* Marks start of migration, used to handle 2nd stage page faults
 +* during migration, prevent installing huge pages and split huge 
 pages
 +* to small pages.
 +*/
 +   int migration_in_progress;
  };

  #define KVM_NR_MEM_OBJS 40
 @@ -230,4 +236,6 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, 
 u64 value);

  void kvm_tlb_flush_vmid(struct kvm *kvm);

 +int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
 +
  #endif /* __ARM_KVM_HOST_H__ */
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 9a4bc10..b916478 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -233,6 +233,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
enum kvm_mr_change change)
  {
 +   /* Request for migration issued by user, write protect memory slot */
 +   if ((change != KVM_MR_DELETE)  (mem-flags  
 KVM_MEM_LOG_DIRTY_PAGES))
 +   return kvm_mmu_slot_remove_write_access(kvm, mem-slot);
 return 0;
  }

 diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
 index 7ab77f3..4d029a6 100644
 --- a/arch/arm/kvm/mmu.c
 +++ b/arch/arm/kvm/mmu.c
 @@ -31,6 +31,11 @@

  #include trace.h

 +#define stage2pud_addr_end(addr, end)  \
 +({ u64 __boundary = ((addr) + PUD_SIZE)  PUD_MASK;\
 +   (__boundary - 1  (end) - 1) ? __boundary : (end);  \
 +})
 
 A matter of personal preference: can this be a static inline function
 instead? That way you could avoid ambiguity with the parameter types.
 (not an issue here, but this has bitten me in the past).

Yes good point, will change.
 
 +
  extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];

  static pgd_t *boot_hyp_pgd;
 @@ -569,6 +574,15 @@ static int stage2_set_pte(struct kvm *kvm, struct 
 kvm_mmu_memory_cache *cache,
 return 0;
  }

 +/* Write protect page */
 +static void stage2_mark_pte_ro(pte_t *pte)
 +{
 +   pte_t new_pte;
 +
 +   new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2);
 +   *pte = new_pte;
 +}
 
 This isn't making the pte read only.
 It's nuking all the flags from the pte and replacing them with factory
 settings. (In this case the PAGE_S2 pgprot).
 If we had other attributes that we later wish to retain this could be
 easily overlooked. Perhaps a new name for the function?

Yes that's pretty bad, I'll clear the write protect bit only.

 
 +
  /**
   * kvm_phys_addr_ioremap - map a device range to guest IPA
   *
 @@ -649,6 +663,155 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
 phys_addr_t *ipap)
 return false;
  }

 +/**
 + * split_pmd - splits huge pages to small pages, required to keep a dirty 
 log of
 + *  smaller memory granules, otherwise huge pages would need to be
 + *  migrated. Practically an idle system has problems migrating with
 + *  huge pages.  Called during WP of entire VM address space, done
 + *  initially when  migration thread isses the KVM_MEM_LOG_DIRTY_PAGES
 + *  ioctl.
 + *  The mmu_lock is held during splitting.
 + *
 + * @kvm:The KVM pointer
 + * @pmd:Pmd to 2nd stage huge page
 + * @addr: ` Guest Physical Address
 Nitpick: typo `

Yes overlooked it, will delete.
 
 + */
 +int split_pmd(struct kvm *kvm, pmd_t *pmd, u64 addr)
 
 Maybe worth renaming to 

[PATCH v3 2/4] live migration support for initial write protect of VM

2014-04-22 Thread Mario Smarduch


Support for live migration initial write protect.
- moved write protect to architecture memory region prepare function. This
  way you can fail, abort migration without keep track of migration status.
- Above also allows to generalize read dirty log function with x86
- Added stage2_mark_pte_ro()
- optimized initial write protect, skip upper table lookups
- added stage2pmd_addr_end() to do generic 4 level table walk 
- changed kvm_flush_remote_tlbs() to weak function

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/include/asm/kvm_host.h |8 ++
 arch/arm/kvm/arm.c  |3 +
 arch/arm/kvm/mmu.c  |  163 +++
 virt/kvm/kvm_main.c |5 +-
 4 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 1e739f9..9f827c8 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -67,6 +67,12 @@ struct kvm_arch {
 
/* Interrupt controller */
struct vgic_distvgic;
+
+   /* Marks start of migration, used to handle 2nd stage page faults
+* during migration, prevent installing huge pages and split huge pages
+* to small pages.
+*/
+   int migration_in_progress;
 };
 
 #define KVM_NR_MEM_OBJS 40
@@ -230,4 +236,6 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 
value);
 
 void kvm_tlb_flush_vmid(struct kvm *kvm);
 
+int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9a4bc10..b916478 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -233,6 +233,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
   struct kvm_userspace_memory_region *mem,
   enum kvm_mr_change change)
 {
+   /* Request for migration issued by user, write protect memory slot */
+   if ((change != KVM_MR_DELETE)  (mem-flags  KVM_MEM_LOG_DIRTY_PAGES))
+   return kvm_mmu_slot_remove_write_access(kvm, mem-slot);
return 0;
 }
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7ab77f3..4d029a6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -31,6 +31,11 @@
 
 #include trace.h
 
+#define stage2pud_addr_end(addr, end)  \
+({ u64 __boundary = ((addr) + PUD_SIZE)  PUD_MASK;\
+   (__boundary - 1  (end) - 1) ? __boundary : (end);  \
+})
+
 extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
 static pgd_t *boot_hyp_pgd;
@@ -569,6 +574,15 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
return 0;
 }
 
+/* Write protect page */
+static void stage2_mark_pte_ro(pte_t *pte)
+{
+   pte_t new_pte;
+
+   new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2);
+   *pte = new_pte;
+}
+
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
@@ -649,6 +663,155 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
phys_addr_t *ipap)
return false;
 }
 
+/**
+ * split_pmd - splits huge pages to small pages, required to keep a dirty log 
of
+ *  smaller memory granules, otherwise huge pages would need to be
+ *  migrated. Practically an idle system has problems migrating with
+ *  huge pages.  Called during WP of entire VM address space, done
+ *  initially when  migration thread isses the KVM_MEM_LOG_DIRTY_PAGES
+ *  ioctl.
+ *  The mmu_lock is held during splitting.
+ *
+ * @kvm:The KVM pointer
+ * @pmd:Pmd to 2nd stage huge page
+ * @addr: ` Guest Physical Address
+ */
+int split_pmd(struct kvm *kvm, pmd_t *pmd, u64 addr)
+{
+   struct page *page;
+   pfn_t pfn = pmd_pfn(*pmd);
+   pte_t *pte;
+   int i;
+
+   page = alloc_page(GFP_KERNEL);
+   if (page == NULL)
+   return -ENOMEM;
+
+   pte = page_address(page);
+   /* cycle through ptes first, use pmd pfn */
+   for (i = 0; i  PTRS_PER_PMD; i++) {
+   pte[i] = pfn_pte(pfn+i, 0);
+   stage2_mark_pte_ro(pte[i]);
+   }
+   kvm_clean_pte(pte);
+   /* After page table setup set pmd */
+   pmd_populate_kernel(NULL, pmd, pte);
+
+   /* get reference on pte page */
+   get_page(virt_to_page(pte));
+   return 0;
+}
+
+/**
+ * kvm_mmu_slot_remove_access - write protects entire VM address space.
+ *  Called at start of migration when KVM_MEM_LOG_DIRTY_PAGES ioctl is
+ *  issued. After this function returns all pages (minus the ones faulted
+ *  in when mmu_lock is released) must be write protected to keep track of
+ *  dirty pages to migrate on subsequent dirty log retrieval.
+ *  mmu_lock is held during write protecting, released on contention.
+ *
+ * @kvm:The KVM pointer
+ * @slot:   The memory slot the dirty log is retrieved for
+ */
+int