Re: [PATCH 1/3] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE

2015-04-27 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 The reference (R) and change (C) bits in a HPT entry can be set by
 hardware at any time up until the HPTE is invalidated and the TLB
 invalidation sequence has completed.  This means that when removing
 a HPTE, we need to read the HPTE after the invalidation sequence has
 completed in order to obtain reliable values of R and C.  The code
 in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265fb
 (KVM: PPC: Book3S HV: Make HTAB code LE host aware) removed the
 read after invalidation as a side effect of other changes.  This
 restores the read of the HPTE after invalidation.

 The user-visible effect of this bug would be that when migrating a
 guest, there is a small probability that a page modified by the guest
 and then unmapped by the guest might not get re-transmitted and thus
 the destination might end up with a stale copy of the page.

 Fixes: 6f22bd3265fb
 Cc: sta...@vger.kernel.org # v3.17+
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
 b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 index f6bf0b1..5c1737f 100644
 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
 @@ -413,14 +413,12 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
 flags,
   rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
   v = pte  ~HPTE_V_HVLOCK;
   if (v  HPTE_V_VALID) {
 - u64 pte1;
 -
 - pte1 = be64_to_cpu(hpte[1]);
   hpte[0] = ~cpu_to_be64(HPTE_V_VALID);
 - rb = compute_tlbie_rb(v, pte1, pte_index);
 + rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
   do_tlbies(kvm, rb, 1, global_invalidates(kvm, flags), true);
   /* Read PTE low word after tlbie to get final R/C values */
 - remove_revmap_chain(kvm, pte_index, rev, v, pte1);
 + remove_revmap_chain(kvm, pte_index, rev, v,
 + be64_to_cpu(hpte[1]));
   }

May be add the above commit message as a code comment ?

   r = rev-guest_rpte  ~HPTE_GR_RESERVED;
   note_hpte_modification(kvm, rev);
 -- 
 2.1.4

 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer

2015-03-29 Thread Aneesh Kumar K.V
pte can get updated from other CPUs as part of multiple activities
like THP split, huge page collapse, unmap. We need to make sure we
don't reload the pte value again and again for different checks.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Note:
This is posted previously as part of
http://article.gmane.org/gmane.linux.ports.ppc.embedded/79278

 arch/powerpc/include/asm/kvm_book3s_64.h |  5 -
 arch/powerpc/kvm/e500_mmu_host.c | 20 
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index cc073a7ac2b7..f06820c67175 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -290,7 +290,10 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t 
*ptep, int writing,
pte_t old_pte, new_pte = __pte(0);
 
while (1) {
-   old_pte = *ptep;
+   /*
+* Make sure we don't reload from ptep
+*/
+   old_pte = READ_ONCE(*ptep);
/*
 * wait until _PAGE_BUSY is clear then set it atomically
 */
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index cc536d4a75ef..5840d546aa03 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -469,14 +469,18 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
 
pgdir = vcpu_e500-vcpu.arch.pgdir;
ptep = lookup_linux_ptep(pgdir, hva, tsize_pages);
-   if (pte_present(*ptep))
-   wimg = (*ptep  PTE_WIMGE_SHIFT)  MAS2_WIMGE_MASK;
-   else {
-   if (printk_ratelimit())
-   pr_err(%s: pte not present: gfn %lx, pfn %lx\n,
-   __func__, (long)gfn, pfn);
-   ret = -EINVAL;
-   goto out;
+   if (ptep) {
+   pte_t pte = READ_ONCE(*ptep);
+
+   if (pte_present(pte))
+   wimg = (pte_val(pte)  PTE_WIMGE_SHIFT) 
+   MAS2_WIMGE_MASK;
+   else {
+   pr_err_ratelimited(%s: pte not present: gfn %lx,pfn 
%lx\n,
+  __func__, (long)gfn, pfn);
+   ret = -EINVAL;
+   goto out;
+   }
}
kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer

2015-03-29 Thread Aneesh Kumar K.V
pte can get updated from other CPUs as part of multiple activities
like THP split, huge page collapse, unmap. We need to make sure we
don't reload the pte value again and again for different checks.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Note:
This is posted previously as part of
http://article.gmane.org/gmane.linux.ports.ppc.embedded/79278

 arch/powerpc/include/asm/kvm_book3s_64.h |  5 -
 arch/powerpc/kvm/e500_mmu_host.c | 20 
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index cc073a7ac2b7..f06820c67175 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -290,7 +290,10 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t 
*ptep, int writing,
pte_t old_pte, new_pte = __pte(0);
 
while (1) {
-   old_pte = *ptep;
+   /*
+* Make sure we don't reload from ptep
+*/
+   old_pte = READ_ONCE(*ptep);
/*
 * wait until _PAGE_BUSY is clear then set it atomically
 */
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index cc536d4a75ef..5840d546aa03 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -469,14 +469,18 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
 
pgdir = vcpu_e500-vcpu.arch.pgdir;
ptep = lookup_linux_ptep(pgdir, hva, tsize_pages);
-   if (pte_present(*ptep))
-   wimg = (*ptep  PTE_WIMGE_SHIFT)  MAS2_WIMGE_MASK;
-   else {
-   if (printk_ratelimit())
-   pr_err(%s: pte not present: gfn %lx, pfn %lx\n,
-   __func__, (long)gfn, pfn);
-   ret = -EINVAL;
-   goto out;
+   if (ptep) {
+   pte_t pte = READ_ONCE(*ptep);
+
+   if (pte_present(pte))
+   wimg = (pte_val(pte)  PTE_WIMGE_SHIFT) 
+   MAS2_WIMGE_MASK;
+   else {
+   pr_err_ratelimited(%s: pte not present: gfn %lx,pfn 
%lx\n,
+  __func__, (long)gfn, pfn);
+   ret = -EINVAL;
+   goto out;
+   }
}
kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: PPC: Remove page table walk helpers

2015-03-29 Thread Aneesh Kumar K.V
This patch remove helpers which we had used only once in the code.
Limiting page table walk variants help in ensuring that we won't
end up with code walking page table with wrong assumptions.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pgtable.h  | 21 -
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 -
 arch/powerpc/kvm/e500_mmu_host.c|  2 +-
 3 files changed, 28 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9835ac4173b7..92fe01c355a9 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -249,27 +249,6 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
 #endif
 pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
 unsigned *shift);
-
-static inline pte_t *lookup_linux_ptep(pgd_t *pgdir, unsigned long hva,
-unsigned long *pte_sizep)
-{
-   pte_t *ptep;
-   unsigned long ps = *pte_sizep;
-   unsigned int shift;
-
-   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
-   if (!ptep)
-   return NULL;
-   if (shift)
-   *pte_sizep = 1ul  shift;
-   else
-   *pte_sizep = PAGE_SIZE;
-
-   if (ps  *pte_sizep)
-   return NULL;
-
-   return ptep;
-}
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 625407e4d3b0..73e083cb9f7e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -131,25 +131,6 @@ static void remove_revmap_chain(struct kvm *kvm, long 
pte_index,
unlock_rmap(rmap);
 }
 
-static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva,
- int writing, unsigned long *pte_sizep)
-{
-   pte_t *ptep;
-   unsigned long ps = *pte_sizep;
-   unsigned int hugepage_shift;
-
-   ptep = find_linux_pte_or_hugepte(pgdir, hva, hugepage_shift);
-   if (!ptep)
-   return __pte(0);
-   if (hugepage_shift)
-   *pte_sizep = 1ul  hugepage_shift;
-   else
-   *pte_sizep = PAGE_SIZE;
-   if (ps  *pte_sizep)
-   return __pte(0);
-   return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift);
-}
-
 static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 {
asm volatile(PPC_RELEASE_BARRIER  : : : memory);
@@ -166,10 +147,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
struct revmap_entry *rev;
unsigned long g_ptel;
struct kvm_memory_slot *memslot;
-   unsigned long pte_size;
+   unsigned hpage_shift;
unsigned long is_io;
unsigned long *rmap;
-   pte_t pte;
+   pte_t *ptep;
unsigned int writing;
unsigned long mmu_seq;
unsigned long rcbits;
@@ -208,22 +189,33 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
 
/* Translate to host virtual address */
hva = __gfn_to_hva_memslot(memslot, gfn);
+   ptep = find_linux_pte_or_hugepte(pgdir, hva, hpage_shift);
+   if (ptep) {
+   pte_t pte;
+   unsigned int host_pte_size;
 
-   /* Look up the Linux PTE for the backing page */
-   pte_size = psize;
-   pte = lookup_linux_pte_and_update(pgdir, hva, writing, pte_size);
-   if (pte_present(pte)  !pte_protnone(pte)) {
-   if (writing  !pte_write(pte))
-   /* make the actual HPTE be read-only */
-   ptel = hpte_make_readonly(ptel);
-   is_io = hpte_cache_bits(pte_val(pte));
-   pa = pte_pfn(pte)  PAGE_SHIFT;
-   pa |= hva  (pte_size - 1);
-   pa |= gpa  ~PAGE_MASK;
-   }
+   if (hpage_shift)
+   host_pte_size = 1ul  hpage_shift;
+   else
+   host_pte_size = PAGE_SIZE;
+   /*
+* We should always find the guest page size
+* to = host page size, if host is using hugepage
+*/
+   if (host_pte_size  psize)
+   return H_PARAMETER;
 
-   if (pte_size  psize)
-   return H_PARAMETER;
+   pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift);
+   if (pte_present(pte)  !pte_protnone(pte)) {
+   if (writing  !pte_write(pte))
+   /* make the actual HPTE be read-only */
+   ptel = hpte_make_readonly(ptel);
+   is_io = hpte_cache_bits(pte_val(pte));
+   pa = pte_pfn(pte)  PAGE_SHIFT;
+   pa |= hva  (host_pte_size - 1);
+   pa |= gpa

[PATCH 2/2] KVM: PPC: Remove page table walk helpers

2015-03-29 Thread Aneesh Kumar K.V
This patch remove helpers which we had used only once in the code.
Limiting page table walk variants help in ensuring that we won't
end up with code walking page table with wrong assumptions.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pgtable.h  | 21 -
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 -
 arch/powerpc/kvm/e500_mmu_host.c|  2 +-
 3 files changed, 28 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9835ac4173b7..92fe01c355a9 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -249,27 +249,6 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
 #endif
 pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
 unsigned *shift);
-
-static inline pte_t *lookup_linux_ptep(pgd_t *pgdir, unsigned long hva,
-unsigned long *pte_sizep)
-{
-   pte_t *ptep;
-   unsigned long ps = *pte_sizep;
-   unsigned int shift;
-
-   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
-   if (!ptep)
-   return NULL;
-   if (shift)
-   *pte_sizep = 1ul  shift;
-   else
-   *pte_sizep = PAGE_SIZE;
-
-   if (ps  *pte_sizep)
-   return NULL;
-
-   return ptep;
-}
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 625407e4d3b0..73e083cb9f7e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -131,25 +131,6 @@ static void remove_revmap_chain(struct kvm *kvm, long 
pte_index,
unlock_rmap(rmap);
 }
 
-static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva,
- int writing, unsigned long *pte_sizep)
-{
-   pte_t *ptep;
-   unsigned long ps = *pte_sizep;
-   unsigned int hugepage_shift;
-
-   ptep = find_linux_pte_or_hugepte(pgdir, hva, hugepage_shift);
-   if (!ptep)
-   return __pte(0);
-   if (hugepage_shift)
-   *pte_sizep = 1ul  hugepage_shift;
-   else
-   *pte_sizep = PAGE_SIZE;
-   if (ps  *pte_sizep)
-   return __pte(0);
-   return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift);
-}
-
 static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 {
asm volatile(PPC_RELEASE_BARRIER  : : : memory);
@@ -166,10 +147,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
struct revmap_entry *rev;
unsigned long g_ptel;
struct kvm_memory_slot *memslot;
-   unsigned long pte_size;
+   unsigned hpage_shift;
unsigned long is_io;
unsigned long *rmap;
-   pte_t pte;
+   pte_t *ptep;
unsigned int writing;
unsigned long mmu_seq;
unsigned long rcbits;
@@ -208,22 +189,33 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
 
/* Translate to host virtual address */
hva = __gfn_to_hva_memslot(memslot, gfn);
+   ptep = find_linux_pte_or_hugepte(pgdir, hva, hpage_shift);
+   if (ptep) {
+   pte_t pte;
+   unsigned int host_pte_size;
 
-   /* Look up the Linux PTE for the backing page */
-   pte_size = psize;
-   pte = lookup_linux_pte_and_update(pgdir, hva, writing, pte_size);
-   if (pte_present(pte)  !pte_protnone(pte)) {
-   if (writing  !pte_write(pte))
-   /* make the actual HPTE be read-only */
-   ptel = hpte_make_readonly(ptel);
-   is_io = hpte_cache_bits(pte_val(pte));
-   pa = pte_pfn(pte)  PAGE_SHIFT;
-   pa |= hva  (pte_size - 1);
-   pa |= gpa  ~PAGE_MASK;
-   }
+   if (hpage_shift)
+   host_pte_size = 1ul  hpage_shift;
+   else
+   host_pte_size = PAGE_SIZE;
+   /*
+* We should always find the guest page size
+* to = host page size, if host is using hugepage
+*/
+   if (host_pte_size  psize)
+   return H_PARAMETER;
 
-   if (pte_size  psize)
-   return H_PARAMETER;
+   pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift);
+   if (pte_present(pte)  !pte_protnone(pte)) {
+   if (writing  !pte_write(pte))
+   /* make the actual HPTE be read-only */
+   ptel = hpte_make_readonly(ptel);
+   is_io = hpte_cache_bits(pte_val(pte));
+   pa = pte_pfn(pte)  PAGE_SHIFT;
+   pa |= hva  (host_pte_size - 1);
+   pa |= gpa

[PATCH] KVM: PPC: BOOK3S: HV: remove rma related variables from code.

2015-02-22 Thread Aneesh Kumar K.V
We don't support real-mode areas now that 970 support is removed.
Remove the remaining details of rma from the code. Also rename
rma_setup_done to hpte_setup_done to better reflect the changes.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_host.h |  3 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++--
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7efd666a3fa7..833486a5734a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,9 +227,8 @@ struct kvm_arch {
int tlbie_lock;
unsigned long lpcr;
unsigned long rmor;
-   struct kvm_rma_info *rma;
unsigned long vrma_slb_v;
-   int rma_setup_done;
+   int hpte_setup_done;
u32 hpt_order;
atomic_t vcpus_running;
u32 online_vcores;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3c6c3d..dbf127168ca4 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
long order;
 
mutex_lock(kvm-lock);
-   if (kvm-arch.rma_setup_done) {
-   kvm-arch.rma_setup_done = 0;
-   /* order rma_setup_done vs. vcpus_running */
+   if (kvm-arch.hpte_setup_done) {
+   kvm-arch.hpte_setup_done = 0;
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(kvm-arch.vcpus_running)) {
-   kvm-arch.rma_setup_done = 1;
+   kvm-arch.hpte_setup_done = 1;
goto out;
}
}
@@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
unsigned long tmp[2];
ssize_t nb;
long int err, ret;
-   int rma_setup;
+   int hpte_setup;
 
if (!access_ok(VERIFY_READ, buf, count))
return -EFAULT;
 
/* lock out vcpus from running while we're doing this */
mutex_lock(kvm-lock);
-   rma_setup = kvm-arch.rma_setup_done;
-   if (rma_setup) {
-   kvm-arch.rma_setup_done = 0;   /* temporarily */
-   /* order rma_setup_done vs. vcpus_running */
+   hpte_setup = kvm-arch.hpte_setup_done;
+   if (hpte_setup) {
+   kvm-arch.hpte_setup_done = 0;  /* temporarily */
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(kvm-arch.vcpus_running)) {
-   kvm-arch.rma_setup_done = 1;
+   kvm-arch.hpte_setup_done = 1;
mutex_unlock(kvm-lock);
return -EBUSY;
}
@@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
   r=%lx\n, ret, i, v, r);
goto out;
}
-   if (!rma_setup  is_vrma_hpte(v)) {
+   if (!hpte_setup  is_vrma_hpte(v)) {
unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
@@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
(VRMA_VSID  SLB_VSID_SHIFT_1T);
lpcr = senc  (LPCR_VRMASD_SH - 4);
kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
-   rma_setup = 1;
+   hpte_setup = 1;
}
++i;
hptp += 2;
@@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
}
 
  out:
-   /* Order HPTE updates vs. rma_setup_done */
+   /* Order HPTE updates vs. hpte_setup_done */
smp_wmb();
-   kvm-arch.rma_setup_done = rma_setup;
+   kvm-arch.hpte_setup_done = hpte_setup;
mutex_unlock(kvm-lock);
 
if (err)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de4018a1bc4b..34e79b8e855c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2032,11 +2032,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
}
 
atomic_inc(vcpu-kvm-arch.vcpus_running);
-   /* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */
+   /* Order vcpus_running vs. hpte_setup_done, see kvmppc_alloc_reset_hpt 
*/
smp_mb

[PATCH] KVM: PPC: BOOK3S: HV: remove rma related variables from code.

2015-02-22 Thread Aneesh Kumar K.V
We don't support real-mode areas now that 970 support is removed.
Remove the remaining details of rma from the code. Also rename
rma_setup_done to hpte_setup_done to better reflect the changes.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_host.h |  3 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++--
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7efd666a3fa7..833486a5734a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,9 +227,8 @@ struct kvm_arch {
int tlbie_lock;
unsigned long lpcr;
unsigned long rmor;
-   struct kvm_rma_info *rma;
unsigned long vrma_slb_v;
-   int rma_setup_done;
+   int hpte_setup_done;
u32 hpt_order;
atomic_t vcpus_running;
u32 online_vcores;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3c6c3d..dbf127168ca4 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
long order;
 
mutex_lock(kvm-lock);
-   if (kvm-arch.rma_setup_done) {
-   kvm-arch.rma_setup_done = 0;
-   /* order rma_setup_done vs. vcpus_running */
+   if (kvm-arch.hpte_setup_done) {
+   kvm-arch.hpte_setup_done = 0;
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(kvm-arch.vcpus_running)) {
-   kvm-arch.rma_setup_done = 1;
+   kvm-arch.hpte_setup_done = 1;
goto out;
}
}
@@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
unsigned long tmp[2];
ssize_t nb;
long int err, ret;
-   int rma_setup;
+   int hpte_setup;
 
if (!access_ok(VERIFY_READ, buf, count))
return -EFAULT;
 
/* lock out vcpus from running while we're doing this */
mutex_lock(kvm-lock);
-   rma_setup = kvm-arch.rma_setup_done;
-   if (rma_setup) {
-   kvm-arch.rma_setup_done = 0;   /* temporarily */
-   /* order rma_setup_done vs. vcpus_running */
+   hpte_setup = kvm-arch.hpte_setup_done;
+   if (hpte_setup) {
+   kvm-arch.hpte_setup_done = 0;  /* temporarily */
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(kvm-arch.vcpus_running)) {
-   kvm-arch.rma_setup_done = 1;
+   kvm-arch.hpte_setup_done = 1;
mutex_unlock(kvm-lock);
return -EBUSY;
}
@@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
   r=%lx\n, ret, i, v, r);
goto out;
}
-   if (!rma_setup  is_vrma_hpte(v)) {
+   if (!hpte_setup  is_vrma_hpte(v)) {
unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
@@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
(VRMA_VSID  SLB_VSID_SHIFT_1T);
lpcr = senc  (LPCR_VRMASD_SH - 4);
kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
-   rma_setup = 1;
+   hpte_setup = 1;
}
++i;
hptp += 2;
@@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
}
 
  out:
-   /* Order HPTE updates vs. rma_setup_done */
+   /* Order HPTE updates vs. hpte_setup_done */
smp_wmb();
-   kvm-arch.rma_setup_done = rma_setup;
+   kvm-arch.hpte_setup_done = hpte_setup;
mutex_unlock(kvm-lock);
 
if (err)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de4018a1bc4b..34e79b8e855c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2032,11 +2032,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
}
 
atomic_inc(vcpu-kvm-arch.vcpus_running);
-   /* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */
+   /* Order vcpus_running vs. hpte_setup_done, see kvmppc_alloc_reset_hpt 
*/
smp_mb

[PATCH V2 2/2] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier

2015-01-26 Thread Aneesh Kumar K.V
We switch to unlock variant with memory barriers in the error path
and also in code path where we had implicit dependency on previous
functions calling lwsync/ptesync. In most of the cases we don't really
need an explicit barrier, but using the variant make sure we don't make
mistakes later with code movements. We also document why a
non-barrier variant is ok in performance critical path.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Rebase to latest upstream

 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 551dabb9551b..0fd91f54d1a7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -639,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -767,8 +767,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
note_hpte_modification(kvm, rev[i]);
}
}
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
unlock_rmap(rmapp);
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -854,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -971,7 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -994,7 +994,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   __unlock_hpte(hptep, v);
+   unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9123132b3053..2e45bd57d4e8 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -268,6 +268,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
pte = be64_to_cpu(hpte[0]);
if (!(pte  (HPTE_V_VALID | HPTE_V_ABSENT)))
break;
+   /*
+* Data dependency will avoid re-ordering
+*/
__unlock_hpte(hpte, pte);
hpte += 2;
}
@@ -286,7 +289,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
cpu_relax();
pte = be64_to_cpu(hpte[0]);
if (pte  (HPTE_V_VALID | HPTE_V_ABSENT)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_PTEG_FULL;
}
}
@@ -406,7 +409,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn) ||
((flags  H_ANDCOND)  (pte  avpn) != 0)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -542,7 +545,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
rcbits = rev-guest_rpte  (HPTE_R_R|HPTE_R_C);
args[j] |= rcbits  (56 - 5);
-   __unlock_hpte(hp, 0);
+   unlock_hpte(hp, 0);
}
}
 
@@ -568,7 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
pte = be64_to_cpu(hpte[0]);
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -748,7 +751,9

[PATCH V2 2/2] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier

2015-01-26 Thread Aneesh Kumar K.V
We switch to unlock variant with memory barriers in the error path
and also in code path where we had implicit dependency on previous
functions calling lwsync/ptesync. In most of the cases we don't really
need an explicit barrier, but using the variant make sure we don't make
mistakes later with code movements. We also document why a
non-barrier variant is ok in performance critical path.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Rebase to latest upstream

 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 551dabb9551b..0fd91f54d1a7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -639,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -767,8 +767,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
note_hpte_modification(kvm, rev[i]);
}
}
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
unlock_rmap(rmapp);
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -854,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -971,7 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -994,7 +994,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   __unlock_hpte(hptep, v);
+   unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9123132b3053..2e45bd57d4e8 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -268,6 +268,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
pte = be64_to_cpu(hpte[0]);
if (!(pte  (HPTE_V_VALID | HPTE_V_ABSENT)))
break;
+   /*
+* Data dependency will avoid re-ordering
+*/
__unlock_hpte(hpte, pte);
hpte += 2;
}
@@ -286,7 +289,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
cpu_relax();
pte = be64_to_cpu(hpte[0]);
if (pte  (HPTE_V_VALID | HPTE_V_ABSENT)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_PTEG_FULL;
}
}
@@ -406,7 +409,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn) ||
((flags  H_ANDCOND)  (pte  avpn) != 0)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -542,7 +545,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
rcbits = rev-guest_rpte  (HPTE_R_R|HPTE_R_C);
args[j] |= rcbits  (56 - 5);
-   __unlock_hpte(hp, 0);
+   unlock_hpte(hp, 0);
}
}
 
@@ -568,7 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
pte = be64_to_cpu(hpte[0]);
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -748,7 +751,9

[PATCH V2 1/2] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte

2015-01-26 Thread Aneesh Kumar K.V
This patch adds helper routine for lock and unlock hpte and use
the same for rest of the code. We don't change any locking rules in this
patch. In the next patch we switch some of the unlock usage to use
the api with barrier and also document the usage without barriers.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Rebase to latest upstream

 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 25 +
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e202bdcc..0789a0f50969 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3c6c3d..551dabb9551b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte-eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev-guest_rpte;
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r = ~HPTE_GR_MODIFIED;
revp-guest_rpte = r;
}
-   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
-   hptp[0

[PATCH V2 1/2] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte

2015-01-26 Thread Aneesh Kumar K.V
This patch adds helper routine for lock and unlock hpte and use
the same for rest of the code. We don't change any locking rules in this
patch. In the next patch we switch some of the unlock usage to use
the api with barrier and also document the usage without barriers.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Rebase to latest upstream

 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 25 +
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e202bdcc..0789a0f50969 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3c6c3d..551dabb9551b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte-eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev-guest_rpte;
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r = ~HPTE_GR_MODIFIED;
revp-guest_rpte = r;
}
-   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
-   hptp[0

Re: [PATCH 1/3] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte

2015-01-12 Thread Aneesh Kumar K.V

Hi,

Any update on this patch. We could drop patch 3. Any feedback on 1 and 2
?.

-aneesh

Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes:

 This patch adds helper routine for lock and unlock hpte and use
 the same for rest of the code. We don't change any locking rules in this
 patch. In the next patch we switch some of the unlock usage to use
 the api with barrier and also document the usage without barriers.

 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
  arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 27 ++-
  3 files changed, 34 insertions(+), 32 deletions(-)

 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
 b/arch/powerpc/include/asm/kvm_book3s_64.h
 index 0aa817933e6a..ec9fb6085843 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_64.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 @@ -86,6 +86,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned 
 long bits)
   return old == 0;
  }
  
 +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 +{
 + hpte_v = ~HPTE_V_HVLOCK;
 + asm volatile(PPC_RELEASE_BARRIER  : : : memory);
 + hpte[0] = cpu_to_be64(hpte_v);
 +}
 +
 +/* Without barrier */
 +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 +{
 + hpte_v = ~HPTE_V_HVLOCK;
 + hpte[0] = cpu_to_be64(hpte_v);
 +}
 +
  static inline int __hpte_actual_psize(unsigned int lp, int psize)
  {
   int i, shift;
 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index cebb86bc4a37..5ea4b2b6a157 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -475,9 +475,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
 *vcpu, gva_t eaddr,
   v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
   gr = kvm-arch.revmap[index].guest_rpte;
  
 - /* Unlock the HPTE */
 - asm volatile(lwsync : : : memory);
 - hptep[0] = cpu_to_be64(v);
 + unlock_hpte(hptep, v);
   preempt_enable();
  
   gpte-eaddr = eaddr;
 @@ -606,8 +604,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
   hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
   hpte[1] = be64_to_cpu(hptep[1]);
   hpte[2] = r = rev-guest_rpte;
 - asm volatile(lwsync : : : memory);
 - hptep[0] = cpu_to_be64(hpte[0]);
 + unlock_hpte(hptep, hpte[0]);
   preempt_enable();
  
   if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
 @@ -758,7 +755,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
  
   hptep[1] = cpu_to_be64(r);
   eieio();
 - hptep[0] = cpu_to_be64(hpte[0]);
 + __unlock_hpte(hptep, hpte[0]);
   asm volatile(ptesync : : : memory);
   preempt_enable();
   if (page  hpte_is_writable(r))
 @@ -777,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
   return ret;
  
   out_unlock:
 - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
 + __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
   preempt_enable();
   goto out_put;
  }
 @@ -907,7 +904,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
 *rmapp,
   }
   }
   unlock_rmap(rmapp);
 - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
 + __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
   }
   return 0;
  }
 @@ -995,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
 *rmapp,
   }
   ret = 1;
   }
 - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
 + __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
   } while ((i = j) != head);
  
   unlock_rmap(rmapp);
 @@ -1118,8 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
 unsigned long *rmapp)
  
   /* Now check and modify the HPTE */
   if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
 - /* unlock and continue */
 - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
 + __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
   continue;
   }
   /* need to make it temporarily absent so C is stable */
 @@ -1139,9 +1135,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
 unsigned long *rmapp)
   npages_dirty = n;
   eieio();
   }
 - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
 + v = ~HPTE_V_ABSENT;
   v |= HPTE_V_VALID;
 - hptep[0] = cpu_to_be64(v);
 + __unlock_hpte(hptep, v);
   } while ((i = j) != head);
  
   unlock_rmap(rmapp);
 @@ -1379,8 +1375,7 @@ static long record_hpte(unsigned long flags, __be64 
 *hptp

Re: [PATCH 1/3] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte

2015-01-12 Thread Aneesh Kumar K.V

Hi,

Any update on this patch. We could drop patch 3. Any feedback on 1 and 2
?.

-aneesh

Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes:

 This patch adds helper routine for lock and unlock hpte and use
 the same for rest of the code. We don't change any locking rules in this
 patch. In the next patch we switch some of the unlock usage to use
 the api with barrier and also document the usage without barriers.

 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
  arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 27 ++-
  3 files changed, 34 insertions(+), 32 deletions(-)

 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
 b/arch/powerpc/include/asm/kvm_book3s_64.h
 index 0aa817933e6a..ec9fb6085843 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_64.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 @@ -86,6 +86,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned 
 long bits)
   return old == 0;
  }
  
 +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 +{
 + hpte_v = ~HPTE_V_HVLOCK;
 + asm volatile(PPC_RELEASE_BARRIER  : : : memory);
 + hpte[0] = cpu_to_be64(hpte_v);
 +}
 +
 +/* Without barrier */
 +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 +{
 + hpte_v = ~HPTE_V_HVLOCK;
 + hpte[0] = cpu_to_be64(hpte_v);
 +}
 +
  static inline int __hpte_actual_psize(unsigned int lp, int psize)
  {
   int i, shift;
 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index cebb86bc4a37..5ea4b2b6a157 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -475,9 +475,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
 *vcpu, gva_t eaddr,
   v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
   gr = kvm-arch.revmap[index].guest_rpte;
  
 - /* Unlock the HPTE */
 - asm volatile(lwsync : : : memory);
 - hptep[0] = cpu_to_be64(v);
 + unlock_hpte(hptep, v);
   preempt_enable();
  
   gpte-eaddr = eaddr;
 @@ -606,8 +604,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
   hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
   hpte[1] = be64_to_cpu(hptep[1]);
   hpte[2] = r = rev-guest_rpte;
 - asm volatile(lwsync : : : memory);
 - hptep[0] = cpu_to_be64(hpte[0]);
 + unlock_hpte(hptep, hpte[0]);
   preempt_enable();
  
   if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
 @@ -758,7 +755,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
  
   hptep[1] = cpu_to_be64(r);
   eieio();
 - hptep[0] = cpu_to_be64(hpte[0]);
 + __unlock_hpte(hptep, hpte[0]);
   asm volatile(ptesync : : : memory);
   preempt_enable();
   if (page  hpte_is_writable(r))
 @@ -777,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
   return ret;
  
   out_unlock:
 - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
 + __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
   preempt_enable();
   goto out_put;
  }
 @@ -907,7 +904,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
 *rmapp,
   }
   }
   unlock_rmap(rmapp);
 - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
 + __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
   }
   return 0;
  }
 @@ -995,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
 *rmapp,
   }
   ret = 1;
   }
 - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
 + __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
   } while ((i = j) != head);
  
   unlock_rmap(rmapp);
 @@ -1118,8 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
 unsigned long *rmapp)
  
   /* Now check and modify the HPTE */
   if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
 - /* unlock and continue */
 - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
 + __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
   continue;
   }
   /* need to make it temporarily absent so C is stable */
 @@ -1139,9 +1135,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
 unsigned long *rmapp)
   npages_dirty = n;
   eieio();
   }
 - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
 + v = ~HPTE_V_ABSENT;
   v |= HPTE_V_VALID;
 - hptep[0] = cpu_to_be64(v);
 + __unlock_hpte(hptep, v);
   } while ((i = j) != head);
  
   unlock_rmap(rmapp);
 @@ -1379,8 +1375,7 @@ static long record_hpte(unsigned long flags, __be64 
 *hptp

Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Aneesh Kumar K.V
Suresh E. Warrier warr...@linux.vnet.ibm.com writes:

 This patch adds trace points in the guest entry and exit code and also
 for exceptions handled by the host in kernel mode - hypercalls and page
 faults. The new events are added to /sys/kernel/debug/tracing/events
 under a new subsystem called kvm_hv.



   /* Set this explicitly in case thread 0 doesn't have a vcpu */
 @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
  
   vc-vcore_state = VCORE_RUNNING;
   preempt_disable();
 +
 + trace_kvmppc_run_core(vc, 0);
 +
   spin_unlock(vc-lock);

Do we really want to call tracepoint with spin lock held ? Is that a good
thing to do ?. 

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Aneesh Kumar K.V
Suresh E. Warrier warr...@linux.vnet.ibm.com writes:

 This patch adds trace points in the guest entry and exit code and also
 for exceptions handled by the host in kernel mode - hypercalls and page
 faults. The new events are added to /sys/kernel/debug/tracing/events
 under a new subsystem called kvm_hv.



   /* Set this explicitly in case thread 0 doesn't have a vcpu */
 @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
  
   vc-vcore_state = VCORE_RUNNING;
   preempt_disable();
 +
 + trace_kvmppc_run_core(vc, 0);
 +
   spin_unlock(vc-lock);

Do we really want to call tracepoint with spin lock held ? Is that a good
thing to do ?. 

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] KVM: PPC: Book3S HV: Fix computation of tlbie operand

2014-11-02 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 The B (segment size) field in the RB operand for the tlbie
 instruction is two bits, which we get from the top two bits of
 the first doubleword of the HPT entry to be invalidated.  These
 bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
 bit numbering).

 The compute_tlbie_rb() function gets these bits as v  (62 - 8),
 which is not correct as it will bring in the top 10 bits, not
 just the top two.  These extra bits could corrupt the AP, AVAL
 and L fields in the RB value.  To fix this we shift right 62 bits
 and then shift left 8 bits, so we only get the two bits of the
 B field.

Good catch.


 The first doubleword of the HPT entry is under the control of the
 guest kernel.  In fact, Linux guests will always put zeroes in bits
 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
 this.

 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org


Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---
  arch/powerpc/include/asm/kvm_book3s_64.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
 b/arch/powerpc/include/asm/kvm_book3s_64.h
 index 0aa8179..a37f1a4 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_64.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 @@ -148,7 +148,7 @@ static inline unsigned long compute_tlbie_rb(unsigned 
 long v, unsigned long r,
   /* This covers 14..54 bits of va*/
   rb = (v  ~0x7fUL)  16;   /* AVA field */
  
 - rb |= v  (62 - 8);/*  B field */
 + rb |= (v  HPTE_V_SSIZE_SHIFT)  8;   /*  B field */
   /*
* AVA in v had cleared lower 23 bits. We need to derive
* that from pteg index
 -- 
 2.1.1

 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] KVM: PPC: Book3S HV: Fix computation of tlbie operand

2014-11-02 Thread Aneesh Kumar K.V
Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes:

 Paul Mackerras pau...@samba.org writes:

 The B (segment size) field in the RB operand for the tlbie
 instruction is two bits, which we get from the top two bits of
 the first doubleword of the HPT entry to be invalidated.  These
 bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
 bit numbering).

 The compute_tlbie_rb() function gets these bits as v  (62 - 8),
 which is not correct as it will bring in the top 10 bits, not
 just the top two.  These extra bits could corrupt the AP, AVAL
 and L fields in the RB value.  To fix this we shift right 62 bits
 and then shift left 8 bits, so we only get the two bits of the
 B field.

 Good catch.


 The first doubleword of the HPT entry is under the control of the
 guest kernel.  In fact, Linux guests will always put zeroes in bits
 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
 this.

 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org


 Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---
  arch/powerpc/include/asm/kvm_book3s_64.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
 b/arch/powerpc/include/asm/kvm_book3s_64.h
 index 0aa8179..a37f1a4 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_64.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 @@ -148,7 +148,7 @@ static inline unsigned long compute_tlbie_rb(unsigned 
 long v, unsigned long r,
  /* This covers 14..54 bits of va*/
  rb = (v  ~0x7fUL)  16;   /* AVA field */
  
 -rb |= v  (62 - 8);/*  B field */
 +rb |= (v  HPTE_V_SSIZE_SHIFT)  8;   /*  B field */

or should we do. I guess the below is more closer to what we have in
rest of the code ?

   rb |= ((v  (HPTE_V_SSIZE_SHIFT - 8))  ~0xffUL); 


  /*
   * AVA in v had cleared lower 23 bits. We need to derive
   * that from pteg index
 -- 
 2.1.1

 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] KVM: PPC: Book3S HV: Fix computation of tlbie operand

2014-11-02 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 The B (segment size) field in the RB operand for the tlbie
 instruction is two bits, which we get from the top two bits of
 the first doubleword of the HPT entry to be invalidated.  These
 bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
 bit numbering).

 The compute_tlbie_rb() function gets these bits as v  (62 - 8),
 which is not correct as it will bring in the top 10 bits, not
 just the top two.  These extra bits could corrupt the AP, AVAL
 and L fields in the RB value.  To fix this we shift right 62 bits
 and then shift left 8 bits, so we only get the two bits of the
 B field.

Good catch.


 The first doubleword of the HPT entry is under the control of the
 guest kernel.  In fact, Linux guests will always put zeroes in bits
 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
 this.

 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org


Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---
  arch/powerpc/include/asm/kvm_book3s_64.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
 b/arch/powerpc/include/asm/kvm_book3s_64.h
 index 0aa8179..a37f1a4 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_64.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 @@ -148,7 +148,7 @@ static inline unsigned long compute_tlbie_rb(unsigned 
 long v, unsigned long r,
   /* This covers 14..54 bits of va*/
   rb = (v  ~0x7fUL)  16;   /* AVA field */
  
 - rb |= v  (62 - 8);/*  B field */
 + rb |= (v  HPTE_V_SSIZE_SHIFT)  8;   /*  B field */
   /*
* AVA in v had cleared lower 23 bits. We need to derive
* that from pteg index
 -- 
 2.1.1

 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] KVM: PPC: Book3S HV: Fix computation of tlbie operand

2014-11-02 Thread Aneesh Kumar K.V
Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes:

 Paul Mackerras pau...@samba.org writes:

 The B (segment size) field in the RB operand for the tlbie
 instruction is two bits, which we get from the top two bits of
 the first doubleword of the HPT entry to be invalidated.  These
 bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
 bit numbering).

 The compute_tlbie_rb() function gets these bits as v  (62 - 8),
 which is not correct as it will bring in the top 10 bits, not
 just the top two.  These extra bits could corrupt the AP, AVAL
 and L fields in the RB value.  To fix this we shift right 62 bits
 and then shift left 8 bits, so we only get the two bits of the
 B field.

 Good catch.


 The first doubleword of the HPT entry is under the control of the
 guest kernel.  In fact, Linux guests will always put zeroes in bits
 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
 this.

 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org


 Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---
  arch/powerpc/include/asm/kvm_book3s_64.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
 b/arch/powerpc/include/asm/kvm_book3s_64.h
 index 0aa8179..a37f1a4 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_64.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 @@ -148,7 +148,7 @@ static inline unsigned long compute_tlbie_rb(unsigned 
 long v, unsigned long r,
  /* This covers 14..54 bits of va*/
  rb = (v  ~0x7fUL)  16;   /* AVA field */
  
 -rb |= v  (62 - 8);/*  B field */
 +rb |= (v  HPTE_V_SSIZE_SHIFT)  8;   /*  B field */

or should we do. I guess the below is more closer to what we have in
rest of the code ?

   rb |= ((v  (HPTE_V_SSIZE_SHIFT - 8))  ~0xffUL); 


  /*
   * AVA in v had cleared lower 23 bits. We need to derive
   * that from pteg index
 -- 
 2.1.1

 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock

2014-10-20 Thread Aneesh Kumar K.V
In kvm_test_clear_dirty_npages(), if we find an invalid HPTE we move on to the
next HPTE without unlocking the invalid one.  In fact we should never
find an invalid and unlocked HPTE in the rmap chain, but for robustness
we should unlock it.  This adds the missing unlock.

Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org
Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d40770248b6a..cebb86bc4a37 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1117,9 +1117,11 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
 
/* Now check and modify the HPTE */
-   if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID)))
+   if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
+   /* unlock and continue */
+   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
continue;
-
+   }
/* need to make it temporarily absent so C is stable */
hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
kvmppc_invalidate_hpte(kvm, hptep, i);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier

2014-10-20 Thread Aneesh Kumar K.V
We switch to unlock variant with memory barriers in the error path
and also in code path where we had implicit dependency on previous
functions calling lwsync/ptesync. In most of the cases we don't really
need an explicit barrier, but using the variant make sure we don't make
mistakes later with code movements. We also document why a
non-barrier variant is ok in performance critical path.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 5ea4b2b6a157..c97690ffb5f6 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -774,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -903,8 +903,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
note_hpte_modification(kvm, rev[i]);
}
}
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
unlock_rmap(rmapp);
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -992,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1115,7 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
/* need to make it temporarily absent so C is stable */
@@ -1137,7 +1137,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   __unlock_hpte(hptep, v);
+   unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 769a5d4c0430..78e689b066f1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -292,6 +292,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
pte = be64_to_cpu(hpte[0]);
if (!(pte  (HPTE_V_VALID | HPTE_V_ABSENT)))
break;
+   /*
+* Data dependency will avoid re-ordering
+*/
__unlock_hpte(hpte, pte);
hpte += 2;
}
@@ -310,7 +313,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
cpu_relax();
pte = be64_to_cpu(hpte[0]);
if (pte  (HPTE_V_VALID | HPTE_V_ABSENT)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_PTEG_FULL;
}
}
@@ -481,7 +484,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn) ||
((flags  H_ANDCOND)  (pte  avpn) != 0)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -617,7 +620,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
rcbits = rev-guest_rpte  (HPTE_R_R|HPTE_R_C);
args[j] |= rcbits  (56 - 5);
-   __unlock_hpte(hp, 0);
+   unlock_hpte(hp, 0);
}
}
 
@@ -643,7 +646,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
pte = be64_to_cpu(hpte[0]);
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND

[PATCH 3/3] KVM: PPC: BOOK3S: HV: Rename variable for better readability

2014-10-20 Thread Aneesh Kumar K.V
Minor cleanup

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 78e689b066f1..2922f8d127ff 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -523,7 +523,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
unsigned long *args = vcpu-arch.gpr[4];
__be64 *hp, *hptes[4];
unsigned long tlbrb[4];
-   long int i, j, k, n, found, indexes[4];
+   long int i, j, k, collected_hpte, found, indexes[4];
unsigned long flags, req, pte_index, rcbits;
int global;
long int ret = H_SUCCESS;
@@ -532,7 +532,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
global = global_invalidates(kvm, 0);
for (i = 0; i  4  ret == H_SUCCESS; ) {
-   n = 0;
+   collected_hpte = 0;
for (; i  4; ++i) {
j = i * 2;
pte_index = args[j];
@@ -554,7 +554,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
hp = (__be64 *) (kvm-arch.hpt_virt + (pte_index  4));
/* to avoid deadlock, don't spin except for first */
if (!try_lock_hpte(hp, HPTE_V_HVLOCK)) {
-   if (n)
+   if (collected_hpte)
break;
while (!try_lock_hpte(hp, HPTE_V_HVLOCK))
cpu_relax();
@@ -596,22 +596,23 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
/* leave it locked */
hp[0] = ~cpu_to_be64(HPTE_V_VALID);
-   tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
-   be64_to_cpu(hp[1]), pte_index);
-   indexes[n] = j;
-   hptes[n] = hp;
-   revs[n] = rev;
-   ++n;
+   tlbrb[collected_hpte] = 
compute_tlbie_rb(be64_to_cpu(hp[0]),
+
be64_to_cpu(hp[1]),
+pte_index);
+   indexes[collected_hpte] = j;
+   hptes[collected_hpte] = hp;
+   revs[collected_hpte] = rev;
+   ++collected_hpte;
}
 
-   if (!n)
+   if (!collected_hpte)
break;
 
/* Now that we've collected a batch, do the tlbies */
-   do_tlbies(kvm, tlbrb, n, global, true);
+   do_tlbies(kvm, tlbrb, collected_hpte, global, true);
 
/* Read PTE low words after tlbie to get final R/C values */
-   for (k = 0; k  n; ++k) {
+   for (k = 0; k  collected_hpte; ++k) {
j = indexes[k];
pte_index = args[j]  ((1ul  56) - 1);
hp = hptes[k];
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte

2014-10-20 Thread Aneesh Kumar K.V
This patch adds helper routine for lock and unlock hpte and use
the same for rest of the code. We don't change any locking rules in this
patch. In the next patch we switch some of the unlock usage to use
the api with barrier and also document the usage without barriers.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 27 ++-
 3 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0aa817933e6a..ec9fb6085843 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -86,6 +86,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index cebb86bc4a37..5ea4b2b6a157 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -475,9 +475,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte-eaddr = eaddr;
@@ -606,8 +604,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev-guest_rpte;
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
@@ -758,7 +755,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
@@ -777,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -907,7 +904,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -995,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1118,8 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
/* need to make it temporarily absent so C is stable */
@@ -1139,9 +1135,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1379,8 +1375,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r = ~HPTE_GR_MODIFIED;
revp-guest_rpte = r;
}
-   asm volatile(PPC_RELEASE_BARRIER

[PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock

2014-10-20 Thread Aneesh Kumar K.V
In kvm_test_clear_dirty_npages(), if we find an invalid HPTE we move on to the
next HPTE without unlocking the invalid one.  In fact we should never
find an invalid and unlocked HPTE in the rmap chain, but for robustness
we should unlock it.  This adds the missing unlock.

Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org
Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d40770248b6a..cebb86bc4a37 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1117,9 +1117,11 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
 
/* Now check and modify the HPTE */
-   if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID)))
+   if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
+   /* unlock and continue */
+   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
continue;
-
+   }
/* need to make it temporarily absent so C is stable */
hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
kvmppc_invalidate_hpte(kvm, hptep, i);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: PPC: BOOK3S: HV: Rename variable for better readability

2014-10-20 Thread Aneesh Kumar K.V
Minor cleanup

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 78e689b066f1..2922f8d127ff 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -523,7 +523,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
unsigned long *args = vcpu-arch.gpr[4];
__be64 *hp, *hptes[4];
unsigned long tlbrb[4];
-   long int i, j, k, n, found, indexes[4];
+   long int i, j, k, collected_hpte, found, indexes[4];
unsigned long flags, req, pte_index, rcbits;
int global;
long int ret = H_SUCCESS;
@@ -532,7 +532,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
global = global_invalidates(kvm, 0);
for (i = 0; i  4  ret == H_SUCCESS; ) {
-   n = 0;
+   collected_hpte = 0;
for (; i  4; ++i) {
j = i * 2;
pte_index = args[j];
@@ -554,7 +554,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
hp = (__be64 *) (kvm-arch.hpt_virt + (pte_index  4));
/* to avoid deadlock, don't spin except for first */
if (!try_lock_hpte(hp, HPTE_V_HVLOCK)) {
-   if (n)
+   if (collected_hpte)
break;
while (!try_lock_hpte(hp, HPTE_V_HVLOCK))
cpu_relax();
@@ -596,22 +596,23 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
/* leave it locked */
hp[0] = ~cpu_to_be64(HPTE_V_VALID);
-   tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
-   be64_to_cpu(hp[1]), pte_index);
-   indexes[n] = j;
-   hptes[n] = hp;
-   revs[n] = rev;
-   ++n;
+   tlbrb[collected_hpte] = 
compute_tlbie_rb(be64_to_cpu(hp[0]),
+
be64_to_cpu(hp[1]),
+pte_index);
+   indexes[collected_hpte] = j;
+   hptes[collected_hpte] = hp;
+   revs[collected_hpte] = rev;
+   ++collected_hpte;
}
 
-   if (!n)
+   if (!collected_hpte)
break;
 
/* Now that we've collected a batch, do the tlbies */
-   do_tlbies(kvm, tlbrb, n, global, true);
+   do_tlbies(kvm, tlbrb, collected_hpte, global, true);
 
/* Read PTE low words after tlbie to get final R/C values */
-   for (k = 0; k  n; ++k) {
+   for (k = 0; k  collected_hpte; ++k) {
j = indexes[k];
pte_index = args[j]  ((1ul  56) - 1);
hp = hptes[k];
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier

2014-10-20 Thread Aneesh Kumar K.V
We switch to unlock variant with memory barriers in the error path
and also in code path where we had implicit dependency on previous
functions calling lwsync/ptesync. In most of the cases we don't really
need an explicit barrier, but using the variant make sure we don't make
mistakes later with code movements. We also document why a
non-barrier variant is ok in performance critical path.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 5ea4b2b6a157..c97690ffb5f6 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -774,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -903,8 +903,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
note_hpte_modification(kvm, rev[i]);
}
}
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
unlock_rmap(rmapp);
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -992,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1115,7 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+   unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
/* need to make it temporarily absent so C is stable */
@@ -1137,7 +1137,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   __unlock_hpte(hptep, v);
+   unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 769a5d4c0430..78e689b066f1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -292,6 +292,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
pte = be64_to_cpu(hpte[0]);
if (!(pte  (HPTE_V_VALID | HPTE_V_ABSENT)))
break;
+   /*
+* Data dependency will avoid re-ordering
+*/
__unlock_hpte(hpte, pte);
hpte += 2;
}
@@ -310,7 +313,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
cpu_relax();
pte = be64_to_cpu(hpte[0]);
if (pte  (HPTE_V_VALID | HPTE_V_ABSENT)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_PTEG_FULL;
}
}
@@ -481,7 +484,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn) ||
((flags  H_ANDCOND)  (pte  avpn) != 0)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND;
}
 
@@ -617,7 +620,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
rcbits = rev-guest_rpte  (HPTE_R_R|HPTE_R_C);
args[j] |= rcbits  (56 - 5);
-   __unlock_hpte(hp, 0);
+   unlock_hpte(hp, 0);
}
}
 
@@ -643,7 +646,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
pte = be64_to_cpu(hpte[0]);
if ((pte  (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
((flags  H_AVPN)  (pte  ~0x7fUL) != avpn)) {
-   __unlock_hpte(hpte, pte);
+   unlock_hpte(hpte, pte);
return H_NOT_FOUND

[PATCH 1/3] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte

2014-10-20 Thread Aneesh Kumar K.V
This patch adds helper routine for lock and unlock hpte and use
the same for rest of the code. We don't change any locking rules in this
patch. In the next patch we switch some of the unlock usage to use
the api with barrier and also document the usage without barriers.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 27 ++-
 3 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0aa817933e6a..ec9fb6085843 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -86,6 +86,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index cebb86bc4a37..5ea4b2b6a157 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -475,9 +475,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte-eaddr = eaddr;
@@ -606,8 +604,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev-guest_rpte;
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
@@ -758,7 +755,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
@@ -777,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -907,7 +904,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -995,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1118,8 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
/* need to make it temporarily absent so C is stable */
@@ -1139,9 +1135,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1379,8 +1375,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r = ~HPTE_GR_MODIFIED;
revp-guest_rpte = r;
}
-   asm volatile(PPC_RELEASE_BARRIER

[PATCH] KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode

2014-09-29 Thread Aneesh Kumar K.V
We use cma reserved area for creating guest hash page table.
Don't do the reservation in non-hypervisor mode. This avoids unnecessary
CMA reservation when booting with limited memory configs like
fadump and kdump.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_hv_builtin.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index b9615ba5b083..4fdc27c80f4c 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -163,6 +163,12 @@ void __init kvm_cma_reserve(void)
unsigned long align_size;
struct memblock_region *reg;
phys_addr_t selected_size = 0;
+
+   /*
+* We need CMA reservation only when we are in HV mode
+*/
+   if (!cpu_has_feature(CPU_FTR_HVMODE))
+   return;
/*
 * We cannot use memblock_phys_mem_size() here, because
 * memblock_analyze() has not been called yet.
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Aneesh Kumar K.V
Alexey Kardashevskiy a...@ozlabs.ru writes:

 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().

 This replaces get_order() with ilog2().

Should we round it up ? ie, ilog2(kvm_rma_pages - 1) + 1 ?


 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: sta...@vger.kernel.org

Why stable ? We merged it this merge window.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 329d7fd..bfe9f01 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
 + page = cma_alloc(kvm_cma, kvm_rma_pages, ilog2(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
   unsigned long align_pages = HPT_ALIGN_PAGES;

 - VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 + VM_BUG_ON(ilog2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 + return cma_alloc(kvm_cma, nr_pages, ilog2(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

 -- 
 2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Aneesh Kumar K.V
Alexey Kardashevskiy a...@ozlabs.ru writes:

 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().

 This replaces get_order() with order_base_2() (round-up version of ilog2).

 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---

 Changes:
 v2:
 * s/ilog2/order_base_2/
 * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is
 broken

 ---
  arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 329d7fd..b9615ba 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
 + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
   unsigned long align_pages = HPT_ALIGN_PAGES;

 - VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 + VM_BUG_ON(order_base_2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

 -- 
 2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Aneesh Kumar K.V
Alexey Kardashevskiy a...@ozlabs.ru writes:

 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().

 This replaces get_order() with ilog2().

Should we round it up ? ie, ilog2(kvm_rma_pages - 1) + 1 ?


 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: sta...@vger.kernel.org

Why stable ? We merged it this merge window.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 329d7fd..bfe9f01 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
 + page = cma_alloc(kvm_cma, kvm_rma_pages, ilog2(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
   unsigned long align_pages = HPT_ALIGN_PAGES;

 - VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 + VM_BUG_ON(ilog2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 + return cma_alloc(kvm_cma, nr_pages, ilog2(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

 -- 
 2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-13 Thread Aneesh Kumar K.V
Alexey Kardashevskiy a...@ozlabs.ru writes:

 fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
 functional change but this is not true as it calls get_order() (which
 takes bytes) where it should have called ilog2() and the kernel stops
 on VM_BUG_ON().

 This replaces get_order() with order_base_2() (round-up version of ilog2).

 Suggested-by: Paul Mackerras pau...@samba.org
 Cc: Alexander Graf ag...@suse.de
 Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---

 Changes:
 v2:
 * s/ilog2/order_base_2/
 * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is
 broken

 ---
  arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 329d7fd..b9615ba 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
 + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
   unsigned long align_pages = HPT_ALIGN_PAGES;

 - VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 + VM_BUG_ON(order_base_2(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

 -- 
 2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host

2014-07-02 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote:
 We want to use virtual page class key protection mechanism for
 indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
 in the host. Those hptes will be marked valid, but have virtual page
 class key set to 30 or 31. These virtual page class numbers are
 configured in AMR to deny read/write. To accomodate such a change, add
 new functions that map, unmap and check whether a hpte is mapped in the
 host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
 virtual page class keys. But we want to differentiate in the code
 where we explicitly check for HPTE_V_VALID with places where we want to
 check whether the hpte is host mapped. This patch enables a closer
 review for such a change.

 [...]

  /* Check for pending invalidations under the rmap chain lock */
  if (kvm-arch.using_mmu_notifiers 
  mmu_notifier_retry(kvm, mmu_seq)) {
 -/* inval in progress, write a non-present HPTE */
 -pteh |= HPTE_V_ABSENT;
 -pteh = ~HPTE_V_VALID;
 +/*
 + * inval in progress in host, write host unmapped pte.
 + */
 +host_unmapped_hpte = 1;

 This isn't right.  We already have HPTE_V_VALID set here, and you now
 don't clear it here, and it doesn't get cleared by the
 __kvmppc_unmap_host_hpte() call below either.


Ok missed that. Will fix that in the next update. In the earlier version
I had kvmppc_unmap_host_hpte always clearing V_VALID. 

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update

2014-07-02 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote:
 As per ISA, we first need to mark hpte invalid (V=0) before we update
 the hpte lower half bits. With virtual page class key protection mechanism 
 we want
 to send any fault other than key fault to guest directly without
 searching the hash page table. But then we can get NO_HPTE fault while
 we are updating the hpte. To track that add a vm specific atomic
 variable that we check in the fault path to always send the fault
 to host.

 [...]

 @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
  r = rcbits | ~(HPTE_R_R | HPTE_R_C);
  
  if (be64_to_cpu(hptep[0])  HPTE_V_VALID) {
 -/* HPTE was previously valid, so we need to invalidate it */
 +/*
 + * If we had mapped this hpte before, we now need to
 + * invalidate that.
 + */
  unlock_rmap(rmap);
 -/* Always mark HPTE_V_ABSENT before invalidating */
 -kvmppc_unmap_host_hpte(kvm, hptep);
  kvmppc_invalidate_hpte(kvm, hptep, index);
  /* don't lose previous R and C bits */
  r |= be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
 +hpte_invalidated = true;

 So now we're not setting the ABSENT bit before invalidating the HPTE.
 That means that another guest vcpu could do an H_ENTER which could
 think that this HPTE is free and use it for another unrelated guest
 HPTE, which would be bad...

But henter looks at HPTE_V_HVLOCK, and we keep that set through out. But
I will double the code again to make sure it is safe in the above
scenario.


 @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm 
 *kvm, unsigned long *rmapp)
  npages_dirty = n;
  eieio();
  }
 -kvmppc_map_host_hpte(kvm, v, r);
 -hptep[0] = cpu_to_be64(v  ~HPTE_V_HVLOCK);
 +hptep[0] = cpu_to_be64(v  ~HPTE_V_LOCK);
 +atomic_dec(kvm-arch.hpte_update_in_progress);

 Why are we using LOCK rather than HVLOCK now?  (And why didn't you
 mention this change and its rationale in the patch description?)

Sorry, that is a typo. I intend to use HPTE_V_HVLOCK.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect

2014-07-02 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote:
 This makes it consistent with h_enter where we clear the key
 bits. We also want to use virtual page class key protection mechanism
 for indicating host page fault. For that we will be using key class
 index 30 and 31. So prevent the guest from updating key bits until
 we add proper support for virtual page class protection mechanism for
 the guest. This will not have any impact for PAPR linux guest because
 Linux guest currently don't use virtual page class key protection model

 As things stand, without this patch series, we do actually have
 everything we need in the kernel for guests to use virtual page class
 keys.  Arguably we should have a capability to tell userspace how many
 storage keys the guest can use, but that's the only missing piece as
 far as I can see.

yes.


 If we add such a capability, I can't see any reason why we should need
 to disable guest use of storage keys in this patchset.

With this patchset, we would need additonal changes to find out whether the key
fault happened because of the guest's usage of the key. I was planning to do
that as an add-on series to keep the changes in this minimal. Also since
linux didn't use keys i was not sure whether guest support of keys is an
important item.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host

2014-07-02 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote:
 We want to use virtual page class key protection mechanism for
 indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
 in the host. Those hptes will be marked valid, but have virtual page
 class key set to 30 or 31. These virtual page class numbers are
 configured in AMR to deny read/write. To accomodate such a change, add
 new functions that map, unmap and check whether a hpte is mapped in the
 host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
 virtual page class keys. But we want to differentiate in the code
 where we explicitly check for HPTE_V_VALID with places where we want to
 check whether the hpte is host mapped. This patch enables a closer
 review for such a change.

 [...]

  /* Check for pending invalidations under the rmap chain lock */
  if (kvm-arch.using_mmu_notifiers 
  mmu_notifier_retry(kvm, mmu_seq)) {
 -/* inval in progress, write a non-present HPTE */
 -pteh |= HPTE_V_ABSENT;
 -pteh = ~HPTE_V_VALID;
 +/*
 + * inval in progress in host, write host unmapped pte.
 + */
 +host_unmapped_hpte = 1;

 This isn't right.  We already have HPTE_V_VALID set here, and you now
 don't clear it here, and it doesn't get cleared by the
 __kvmppc_unmap_host_hpte() call below either.


Ok missed that. Will fix that in the next update. In the earlier version
I had kvmppc_unmap_host_hpte always clearing V_VALID. 

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update

2014-07-02 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote:
 As per ISA, we first need to mark hpte invalid (V=0) before we update
 the hpte lower half bits. With virtual page class key protection mechanism 
 we want
 to send any fault other than key fault to guest directly without
 searching the hash page table. But then we can get NO_HPTE fault while
 we are updating the hpte. To track that add a vm specific atomic
 variable that we check in the fault path to always send the fault
 to host.

 [...]

 @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
  r = rcbits | ~(HPTE_R_R | HPTE_R_C);
  
  if (be64_to_cpu(hptep[0])  HPTE_V_VALID) {
 -/* HPTE was previously valid, so we need to invalidate it */
 +/*
 + * If we had mapped this hpte before, we now need to
 + * invalidate that.
 + */
  unlock_rmap(rmap);
 -/* Always mark HPTE_V_ABSENT before invalidating */
 -kvmppc_unmap_host_hpte(kvm, hptep);
  kvmppc_invalidate_hpte(kvm, hptep, index);
  /* don't lose previous R and C bits */
  r |= be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
 +hpte_invalidated = true;

 So now we're not setting the ABSENT bit before invalidating the HPTE.
 That means that another guest vcpu could do an H_ENTER which could
 think that this HPTE is free and use it for another unrelated guest
 HPTE, which would be bad...

But henter looks at HPTE_V_HVLOCK, and we keep that set through out. But
I will double the code again to make sure it is safe in the above
scenario.


 @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm 
 *kvm, unsigned long *rmapp)
  npages_dirty = n;
  eieio();
  }
 -kvmppc_map_host_hpte(kvm, v, r);
 -hptep[0] = cpu_to_be64(v  ~HPTE_V_HVLOCK);
 +hptep[0] = cpu_to_be64(v  ~HPTE_V_LOCK);
 +atomic_dec(kvm-arch.hpte_update_in_progress);

 Why are we using LOCK rather than HVLOCK now?  (And why didn't you
 mention this change and its rationale in the patch description?)

Sorry, that is a typo. I intend to use HPTE_V_HVLOCK.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect

2014-07-02 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote:
 This makes it consistent with h_enter where we clear the key
 bits. We also want to use virtual page class key protection mechanism
 for indicating host page fault. For that we will be using key class
 index 30 and 31. So prevent the guest from updating key bits until
 we add proper support for virtual page class protection mechanism for
 the guest. This will not have any impact for PAPR linux guest because
 Linux guest currently don't use virtual page class key protection model

 As things stand, without this patch series, we do actually have
 everything we need in the kernel for guests to use virtual page class
 keys.  Arguably we should have a capability to tell userspace how many
 storage keys the guest can use, but that's the only missing piece as
 far as I can see.

yes.


 If we add such a capability, I can't see any reason why we should need
 to disable guest use of storage keys in this patchset.

With this patchset, we would need additonal changes to find out whether the key
fault happened because of the guest's usage of the key. I was planning to do
that as an add-on series to keep the changes in this minimal. Also since
linux didn't use keys i was not sure whether guest support of keys is an
important item.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect

2014-06-29 Thread Aneesh Kumar K.V
This makes it consistent with h_enter where we clear the key
bits. We also want to use virtual page class key protection mechanism
for indicating host page fault. For that we will be using key class
index 30 and 31. So prevent the guest from updating key bits until
we add proper support for virtual page class protection mechanism for
the guest. This will not have any impact for PAPR linux guest because
Linux guest currently don't use virtual page class key protection model

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 157a5f35edfa..f908845f7379 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -658,13 +658,17 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned 
long flags,
}
 
v = pte;
+   /*
+* We ignore key bits here. We use class 31 and 30 for
+* hypervisor purpose. We still don't track the page
+* class seperately. Until then don't allow h_protect
+* to change key bits.
+*/
bits = (flags  55)  HPTE_R_PP0;
-   bits |= (flags  48)  HPTE_R_KEY_HI;
-   bits |= flags  (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
+   bits |= flags  (HPTE_R_PP | HPTE_R_N);
 
/* Update guest view of 2nd HPTE dword */
-   mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
-   HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+   mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N;
rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
if (rev) {
r = (rev-guest_rpte  ~mask) | bits;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers

2014-06-29 Thread Aneesh Kumar K.V
We will use this to set HPTE_V_VRMA bit in the later patch. This also
make sure we clear the hpte bits only when called via hcall.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 +--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  8 ++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 09a47aeb5b63..1c137f45dd55 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -371,8 +371,6 @@ long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned 
long flags,
if (!psize)
return H_PARAMETER;
 
-   pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-
/* Find the memslot (if any) for this address */
gpa = (ptel  HPTE_R_RPN)  ~(psize - 1);
gfn = gpa  PAGE_SHIFT;
@@ -408,6 +406,12 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, 
unsigned long flags,
 long pte_index, unsigned long pteh,
 unsigned long ptel)
 {
+   /*
+* Clear few bits, when called via hcall
+*/
+   pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+   ptel = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
return kvmppc_virtmode_do_h_enter(vcpu-kvm, flags, pte_index,
  pteh, ptel, vcpu-arch.gpr[4]);
 }
@@ -1560,6 +1564,13 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
if (be64_to_cpu(hptp[0])  (HPTE_V_VALID | 
HPTE_V_ABSENT))
kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
err = -EIO;
+   /*
+* Clear few bits we got via read_htab which we
+* don't need to carry forward.
+*/
+   v = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+   r = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | 
HPTE_GR_RESERVED);
+
ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
 tmp);
if (ret != H_SUCCESS) {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 084ad54c73cd..157a5f35edfa 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -182,8 +182,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
if (!psize)
return H_PARAMETER;
writing = hpte_is_writable(ptel);
-   pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-   ptel = ~HPTE_GR_RESERVED;
g_ptel = ptel;
 
/* used later to detect if we might have been invalidated */
@@ -367,6 +365,12 @@ EXPORT_SYMBOL_GPL(kvmppc_do_h_enter);
 long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel)
 {
+   /*
+* Clear few bits. when called via hcall.
+*/
+   pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+   ptel = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
return kvmppc_do_h_enter(vcpu-kvm, flags, pte_index, pteh, ptel,
 vcpu-arch.pgdir, true, vcpu-arch.gpr[4]);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host

2014-06-29 Thread Aneesh Kumar K.V
We want to use virtual page class key protection mechanism for
indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
in the host. Those hptes will be marked valid, but have virtual page
class key set to 30 or 31. These virtual page class numbers are
configured in AMR to deny read/write. To accomodate such a change, add
new functions that map, unmap and check whether a hpte is mapped in the
host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
virtual page class keys. But we want to differentiate in the code
where we explicitly check for HPTE_V_VALID with places where we want to
check whether the hpte is host mapped. This patch enables a closer
review for such a change.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 36 
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 24 +++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 30 ++
 3 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0aa817933e6a..da00b1f05ea1 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -400,6 +400,42 @@ static inline int is_vrma_hpte(unsigned long hpte_v)
(HPTE_V_1TB_SEG | (VRMA_VSID  (40 - 16)));
 }
 
+static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm,
+   unsigned long *hpte_v,
+   unsigned long *hpte_r,
+   bool mmio)
+{
+   *hpte_v |= HPTE_V_ABSENT;
+   if (mmio)
+   *hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+}
+
+static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
+{
+   /*
+* We will never call this for MMIO
+*/
+   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+}
+
+static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
+   unsigned long *hpte_r)
+{
+   *hpte_v |= HPTE_V_VALID;
+   *hpte_v = ~HPTE_V_ABSENT;
+}
+
+static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
+{
+   unsigned long v;
+
+   v = be64_to_cpu(hpte[0]);
+   if (v  HPTE_V_VALID)
+   return true;
+   return false;
+}
+
+
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
  * Note modification of an HPTE; set the HPTE modified bit
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 590e07b1a43f..8ce5e95613f8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -752,7 +752,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
if (be64_to_cpu(hptep[0])  HPTE_V_VALID) {
/* HPTE was previously valid, so we need to invalidate it */
unlock_rmap(rmap);
-   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   /* Always mark HPTE_V_ABSENT before invalidating */
+   kvmppc_unmap_host_hpte(kvm, hptep);
kvmppc_invalidate_hpte(kvm, hptep, index);
/* don't lose previous R and C bits */
r |= be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
@@ -897,11 +898,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
/* Now check and modify the HPTE */
ptel = rev[i].guest_rpte;
psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
-   if ((be64_to_cpu(hptep[0])  HPTE_V_VALID) 
+   if (kvmppc_is_host_mapped_hpte(kvm, hptep) 
hpte_rpn(ptel, psize) == gfn) {
if (kvm-arch.using_mmu_notifiers)
-   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   kvmppc_unmap_host_hpte(kvm, hptep);
kvmppc_invalidate_hpte(kvm, hptep, i);
+
/* Harvest R and C */
rcbits = be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
*rmapp |= rcbits  KVMPPC_RMAP_RC_SHIFT;
@@ -990,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
 
/* Now check and modify the HPTE */
-   if ((be64_to_cpu(hptep[0])  HPTE_V_VALID) 
+   if (kvmppc_is_host_mapped_hpte(kvm, hptep) 
(be64_to_cpu(hptep[1])  HPTE_R_R)) {
kvmppc_clear_ref_hpte(kvm, hptep, i);
if (!(rev[i].guest_rpte  HPTE_R_R)) {
@@ -1121,11 +1123,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
 
/* Now check and modify the HPTE */
-   if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID)))
+   if (!kvmppc_is_host_mapped_hpte

[PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault

2014-06-29 Thread Aneesh Kumar K.V
Hi,

With the current code we do an expensive hash page table lookup on every
page fault resulting from a missing hash page table entry. A NO_HPTE
page fault can happen due to the below reasons:

1) Missing hash pte as per guest. This should be forwarded to the guest
2) MMIO hash pte. The address against which the load/store is performed
   should be emulated as a MMIO operation.
3) Missing hash pte because host swapped out the guest page.

We want to differentiate (1) from (2) and (3) so that we can speed up
page fault due to (1). Optimizing (1) will help in improving
the overall performance because that covers a large percentage of
the page faults.

To achieve the above we use virtual page calss protection mechanism for
covering (2) and (3). For both the above case we mark the hpte
valid, but associate the page with virtual page class index 30 and 31.
The authority mask register is configured such that class index 30 and 31
will have read/write denied. The above change results in a key fault
for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
without doing the expensive hash pagetable lookup.

For the test below:

#include unistd.h
#include stdio.h
#include stdlib.h
#include sys/mman.h

#define PAGES (40*1024)

int main()
{
unsigned long size = getpagesize();
unsigned long length = size * PAGES;
unsigned long i, j, k = 0;

for (j = 0; j  10; j++) {
char *c = mmap(NULL, length, PROT_READ|PROT_WRITE,
   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (c == MAP_FAILED) {
perror(mmap);
exit(1);
}
for (i = 0; i  length; i += size)
c[i] = 0;

/* flush hptes */
mprotect(c, length, PROT_WRITE);

for (i = 0; i  length; i += size)
c[i] = 10;

mprotect(c, length, PROT_READ);

for (i = 0; i  length; i += size)
k += c[i];

munmap(c, length);
}
}

Without Fix:
--
[root@qemu-pr-host ~]# time ./pfault

real0m8.438s
user0m0.855s
sys 0m7.540s
[root@qemu-pr-host ~]#


With Fix:

[root@qemu-pr-host ~]# time ./pfault

real0m7.833s
user0m0.782s
sys 0m7.038s
[root@qemu-pr-host ~]#



Aneesh Kumar K.V (6):
  KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
  KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
  KVM: PPC: BOOK3S: HV: Remove dead code
  KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in
host
  KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte
during an hpte update
  KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for
host fault and mmio

 arch/powerpc/include/asm/kvm_book3s_64.h |  97 +-
 arch/powerpc/include/asm/kvm_host.h  |   1 +
 arch/powerpc/include/asm/reg.h   |   1 +
 arch/powerpc/kernel/asm-offsets.c|   1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  99 --
 arch/powerpc/kvm/book3s_hv.c |   1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 166 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 100 +--
 8 files changed, 371 insertions(+), 95 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update

2014-06-29 Thread Aneesh Kumar K.V
As per ISA, we first need to mark hpte invalid (V=0) before we update
the hpte lower half bits. With virtual page class key protection mechanism we 
want
to send any fault other than key fault to guest directly without
searching the hash page table. But then we can get NO_HPTE fault while
we are updating the hpte. To track that add a vm specific atomic
variable that we check in the fault path to always send the fault
to host.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  1 +
 arch/powerpc/include/asm/kvm_host.h  |  1 +
 arch/powerpc/kernel/asm-offsets.c|  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 19 ++
 arch/powerpc/kvm/book3s_hv.c |  1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 40 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 60 +---
 7 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index da00b1f05ea1..a6bf41865a66 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -416,6 +416,7 @@ static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, 
__be64 *hptep)
 * We will never call this for MMIO
 */
hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   atomic_dec(kvm-arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f9ae69682ce1..0a9ff60fae4c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
spinlock_t slot_phys_lock;
cpumask_t need_tlb_flush;
+   atomic_t hpte_update_in_progress;
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
int hpt_cma_alloc;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index f5995a912213..54a36110f8f2 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -496,6 +496,7 @@ int main(void)
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+   DEFINE(KVM_HPTE_UPDATE, offsetof(struct kvm, 
arch.hpte_update_in_progress));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8ce5e95613f8..cb7a616aacb1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -592,6 +592,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
unsigned int writing, write_ok;
struct vm_area_struct *vma;
unsigned long rcbits;
+   bool hpte_invalidated = false;
 
/*
 * Real-mode code has already searched the HPT and found the
@@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = rcbits | ~(HPTE_R_R | HPTE_R_C);
 
if (be64_to_cpu(hptep[0])  HPTE_V_VALID) {
-   /* HPTE was previously valid, so we need to invalidate it */
+   /*
+* If we had mapped this hpte before, we now need to
+* invalidate that.
+*/
unlock_rmap(rmap);
-   /* Always mark HPTE_V_ABSENT before invalidating */
-   kvmppc_unmap_host_hpte(kvm, hptep);
kvmppc_invalidate_hpte(kvm, hptep, index);
/* don't lose previous R and C bits */
r |= be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
+   hpte_invalidated = true;
} else {
kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
}
@@ -765,6 +768,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
eieio();
hptep[0] = cpu_to_be64(hpte[0]);
asm volatile(ptesync : : : memory);
+   if (hpte_invalidated)
+   atomic_dec(kvm-arch.hpte_update_in_progress);
+
preempt_enable();
if (page  hpte_is_writable(r))
SetPageDirty(page);
@@ -1128,10 +1134,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
/*
 * need to make it temporarily absent so C is stable
 */
-   kvmppc_unmap_host_hpte(kvm, hptep);
-   kvmppc_invalidate_hpte(kvm, hptep, i);
v = be64_to_cpu(hptep[0]);
r = be64_to_cpu(hptep

[PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio

2014-06-29 Thread Aneesh Kumar K.V
With this patch we use AMR class 30 and 31 for indicating a page
fault that should be handled by host. This includes the MMIO access and
the page fault resulting from guest RAM swapout in the host. This
enables us to forward the fault to guest without doing the expensive
hash page table search for finding the hpte entry. With this patch, we
mark hash pte always valid and use class index 30 and 31 for key based
fault. These virtual class index are configured in AMR to deny
read/write. Since access class protection mechanism doesn't work with
VRMA region, we need to handle them separately. We mark those HPTEs
invalid and use the software defined bit, HPTE_V_VRMA, to differentiate
them.

NOTE: We still need to handle protection fault in host so that a
write to KSM shared page is handled in the host.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 80 +++-
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 48 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 69 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 52 -
 5 files changed, 194 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index a6bf41865a66..4aa9c3601fe8 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -48,7 +48,18 @@ extern unsigned long kvm_rma_pages;
  * HPTEs.
  */
 #define HPTE_V_HVLOCK  0x40UL
-#define HPTE_V_ABSENT  0x20UL
+/*
+ * VRMA mapping
+ */
+#define HPTE_V_VRMA0x20UL
+
+#define HPTE_R_HOST_UNMAP_KEY  0x3e00UL
+/*
+ * We use this to differentiate between an MMIO key fault and
+ * and a key fault resulting from host swapping out the page.
+ */
+#define HPTE_R_MMIO_UNMAP_KEY  0x3c00UL
+
 
 /*
  * We use this bit in the guest_rpte field of the revmap entry
@@ -405,35 +416,82 @@ static inline void __kvmppc_unmap_host_hpte(struct kvm 
*kvm,
unsigned long *hpte_r,
bool mmio)
 {
-   *hpte_v |= HPTE_V_ABSENT;
-   if (mmio)
-   *hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+   /*
+* We unmap on host by adding the page to AMR class 31
+* which have both read/write access denied.
+*
+* For VRMA area we mark them invalid.
+*
+* If we are not using mmu_notifiers we don't use Access
+* class protection.
+*
+* Since we are not changing the hpt directly we don't
+* Worry about update ordering.
+*/
+   if ((*hpte_v  HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers)
+   *hpte_v = ~HPTE_V_VALID;
+   else if (!mmio) {
+   *hpte_r |= HPTE_R_HOST_UNMAP_KEY;
+   *hpte_v |= HPTE_V_VALID;
+   } else {
+   *hpte_r |= HPTE_R_MMIO_UNMAP_KEY;
+   *hpte_v |= HPTE_V_VALID;
+   }
 }
 
 static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
 {
+   unsigned long pte_v, pte_r;
+
+   pte_v = be64_to_cpu(hptep[0]);
+   pte_r = be64_to_cpu(hptep[1]);
/*
 * We will never call this for MMIO
 */
-   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   __kvmppc_unmap_host_hpte(kvm, pte_v, pte_r, 0);
+   hptep[1] = cpu_to_be64(pte_r);
+   eieio();
+   hptep[0] = cpu_to_be64(pte_v);
+   asm volatile(ptesync : : : memory);
+   /*
+* we have now successfully marked the hpte using key bits
+*/
atomic_dec(kvm-arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
unsigned long *hpte_r)
 {
-   *hpte_v |= HPTE_V_VALID;
-   *hpte_v = ~HPTE_V_ABSENT;
+   /*
+* We will never try to map an MMIO region
+*/
+   if ((*hpte_v  HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers)
+   *hpte_v |= HPTE_V_VALID;
+   else {
+   /*
+* When we allow guest keys we should set this with key
+* for this page.
+*/
+   *hpte_r = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+   }
 }
 
 static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
 {
-   unsigned long v;
+   unsigned long v, r;
 
v = be64_to_cpu(hpte[0]);
-   if (v  HPTE_V_VALID)
-   return true;
-   return false;
+   if ((v  HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers)
+   return v  HPTE_V_VALID;
+
+   r = be64_to_cpu(hpte[1]);
+   if (!(v  HPTE_V_VALID))
+   return false;
+   if ((r  (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) == HPTE_R_HOST_UNMAP_KEY)
+   return false;
+   if ((r  (HPTE_R_KEY_HI | HPTE_R_KEY_LO

[PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page

2014-06-29 Thread Aneesh Kumar K.V
When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 6 --
 arch/powerpc/kvm/book3s_hv.c | 6 --
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 66a0a44b62a8..ca7c1688a7b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
 */
/* This covers 14..54 bits of va*/
rb = (v  ~0x7fUL)  16;   /* AVA field */
+
+   rb |= v  (62 - 8);/*  B field */
/*
 * AVA in v had cleared lower 23 bits. We need to derive
 * that from pteg index
@@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned 
long v, unsigned long r,
{
int aval_shift;
/*
-* remaining 7bits of AVA/LP fields
+* remaining bits of AVA/LP fields
 * Also contain the rr bits of LP
 */
-   rb |= (va_low  0x7f)  16;
+   rb |= (va_low  mmu_psize_defs[b_psize].shift)  0x7ff000;
/*
 * Now clear not needed LP bits based on actual psize
 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbf46eb3f59c..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct 
kvm_ppc_one_seg_page_size **sps,
(*sps)-page_shift = def-shift;
(*sps)-slb_enc = def-sllp;
(*sps)-enc[0].page_shift = def-shift;
-   /*
-* Only return base page encoding. We don't want to return
-* all the supporting pte_enc, because our H_ENTER doesn't
-* support MPSS yet. Once they do, we can start passing all
-* support pte_enc here
-*/
(*sps)-enc[0].pte_enc = def-penc[linux_psize];
/*
 * Add 16MB MPSS support if host supports it
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] KVM: PPC: BOOK3S: HV: Remove dead code

2014-06-29 Thread Aneesh Kumar K.V
Since we do don't support virtual page class key protection mechanism in
the guest, we should not find a keyfault that needs to be forwarded to
the guest. So remove the dead code.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 9 -
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 -
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1c137f45dd55..590e07b1a43f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -499,15 +499,6 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
gpte-may_write = hpte_write_permission(pp, key);
gpte-may_execute = gpte-may_read  !(gr  (HPTE_R_N | HPTE_R_G));
 
-   /* Storage key permission check for POWER7 */
-   if (data  virtmode  cpu_has_feature(CPU_FTR_ARCH_206)) {
-   int amrfield = hpte_get_skey_perm(gr, vcpu-arch.amr);
-   if (amrfield  1)
-   gpte-may_read = 0;
-   if (amrfield  2)
-   gpte-may_write = 0;
-   }
-
/* Get the guest physical address */
gpte-raddr = kvmppc_mmu_get_real_addr(v, gr, eaddr);
return 0;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index f908845f7379..1884bff3122c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -925,15 +925,6 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned 
long addr,
return status | DSISR_PROTFAULT;
}
 
-   /* Check storage key, if applicable */
-   if (data  (vcpu-arch.shregs.msr  MSR_DR)) {
-   unsigned int perm = hpte_get_skey_perm(gr, vcpu-arch.amr);
-   if (status  DSISR_ISSTORE)
-   perm = 1;
-   if (perm  1)
-   return status | DSISR_KEYFAULT;
-   }
-
/* Save HPTE info for virtual-mode handler */
vcpu-arch.pgfault_addr = addr;
vcpu-arch.pgfault_index = index;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault

2014-06-29 Thread Aneesh Kumar K.V
Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote:

 To achieve the above we use virtual page calss protection mechanism for
 covering (2) and (3). For both the above case we mark the hpte
 valid, but associate the page with virtual page class index 30 and 31.
 The authority mask register is configured such that class index 30 and 31
 will have read/write denied. The above change results in a key fault
 for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
 without doing the expensive hash pagetable lookup.

 So we have a measurable performance benefit (about half a second out of
 8).

I was able to measure a performance benefit of 2 seconds earlier. But
once i get the below patch applied that got reduced. I am trying
to find how the patch is helping the performance. May be it is
avoiding some unnecessary invalidation ?

http://mid.gmane.org/1403876103-32459-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com

I also believe the benefit depends on how much impact a hash table
lookup have. I did try to access the addresses linearly so that I can make
sure we do take a cache miss for hash page table access. 

but you didn't explain the drawback here which is to essentially make
 it impossible for guests to exploit virtual page class keys, or did you
 find a way to still make that possible ?

I am now making PROTFAULT to go to host. That means, ksm sharing is
represented as read only page and an attempt to write to it will get to
host via PROTFAULT. Now with that we can implement keys for guest if we
want to. So irrespective of what restrictions guest has put in, if the
host swapout the page, we will deny read/write. Now if the key fault
need to go to guest, we will find that out looking at the key index. 


 As it-is, it's not a huge issue for Linux but we might have to care with
 other OSes that do care...

 Do we have a way in PAPR to signify to the guest that the keys are not
 available ?

Right now Qemu doesn't provide the device tree node
ibm,processor-storage-keys. That means guest cannot use keys. So we are
good there. If we want to support guest keys, we need to fill that with
the value that indicate how many keys can be used for data and instruction.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers

2014-06-29 Thread Aneesh Kumar K.V
We will use this to set HPTE_V_VRMA bit in the later patch. This also
make sure we clear the hpte bits only when called via hcall.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 +--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  8 ++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 09a47aeb5b63..1c137f45dd55 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -371,8 +371,6 @@ long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned 
long flags,
if (!psize)
return H_PARAMETER;
 
-   pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-
/* Find the memslot (if any) for this address */
gpa = (ptel  HPTE_R_RPN)  ~(psize - 1);
gfn = gpa  PAGE_SHIFT;
@@ -408,6 +406,12 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, 
unsigned long flags,
 long pte_index, unsigned long pteh,
 unsigned long ptel)
 {
+   /*
+* Clear few bits, when called via hcall
+*/
+   pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+   ptel = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
return kvmppc_virtmode_do_h_enter(vcpu-kvm, flags, pte_index,
  pteh, ptel, vcpu-arch.gpr[4]);
 }
@@ -1560,6 +1564,13 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
if (be64_to_cpu(hptp[0])  (HPTE_V_VALID | 
HPTE_V_ABSENT))
kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
err = -EIO;
+   /*
+* Clear few bits we got via read_htab which we
+* don't need to carry forward.
+*/
+   v = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+   r = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | 
HPTE_GR_RESERVED);
+
ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
 tmp);
if (ret != H_SUCCESS) {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 084ad54c73cd..157a5f35edfa 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -182,8 +182,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
if (!psize)
return H_PARAMETER;
writing = hpte_is_writable(ptel);
-   pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-   ptel = ~HPTE_GR_RESERVED;
g_ptel = ptel;
 
/* used later to detect if we might have been invalidated */
@@ -367,6 +365,12 @@ EXPORT_SYMBOL_GPL(kvmppc_do_h_enter);
 long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel)
 {
+   /*
+* Clear few bits. when called via hcall.
+*/
+   pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+   ptel = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
return kvmppc_do_h_enter(vcpu-kvm, flags, pte_index, pteh, ptel,
 vcpu-arch.pgdir, true, vcpu-arch.gpr[4]);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault

2014-06-29 Thread Aneesh Kumar K.V
Hi,

With the current code we do an expensive hash page table lookup on every
page fault resulting from a missing hash page table entry. A NO_HPTE
page fault can happen due to the below reasons:

1) Missing hash pte as per guest. This should be forwarded to the guest
2) MMIO hash pte. The address against which the load/store is performed
   should be emulated as a MMIO operation.
3) Missing hash pte because host swapped out the guest page.

We want to differentiate (1) from (2) and (3) so that we can speed up
page fault due to (1). Optimizing (1) will help in improving
the overall performance because that covers a large percentage of
the page faults.

To achieve the above we use virtual page calss protection mechanism for
covering (2) and (3). For both the above case we mark the hpte
valid, but associate the page with virtual page class index 30 and 31.
The authority mask register is configured such that class index 30 and 31
will have read/write denied. The above change results in a key fault
for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
without doing the expensive hash pagetable lookup.

For the test below:

#include unistd.h
#include stdio.h
#include stdlib.h
#include sys/mman.h

#define PAGES (40*1024)

int main()
{
unsigned long size = getpagesize();
unsigned long length = size * PAGES;
unsigned long i, j, k = 0;

for (j = 0; j  10; j++) {
char *c = mmap(NULL, length, PROT_READ|PROT_WRITE,
   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (c == MAP_FAILED) {
perror(mmap);
exit(1);
}
for (i = 0; i  length; i += size)
c[i] = 0;

/* flush hptes */
mprotect(c, length, PROT_WRITE);

for (i = 0; i  length; i += size)
c[i] = 10;

mprotect(c, length, PROT_READ);

for (i = 0; i  length; i += size)
k += c[i];

munmap(c, length);
}
}

Without Fix:
--
[root@qemu-pr-host ~]# time ./pfault

real0m8.438s
user0m0.855s
sys 0m7.540s
[root@qemu-pr-host ~]#


With Fix:

[root@qemu-pr-host ~]# time ./pfault

real0m7.833s
user0m0.782s
sys 0m7.038s
[root@qemu-pr-host ~]#



Aneesh Kumar K.V (6):
  KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
  KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
  KVM: PPC: BOOK3S: HV: Remove dead code
  KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in
host
  KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte
during an hpte update
  KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for
host fault and mmio

 arch/powerpc/include/asm/kvm_book3s_64.h |  97 +-
 arch/powerpc/include/asm/kvm_host.h  |   1 +
 arch/powerpc/include/asm/reg.h   |   1 +
 arch/powerpc/kernel/asm-offsets.c|   1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  99 --
 arch/powerpc/kvm/book3s_hv.c |   1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 166 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 100 +--
 8 files changed, 371 insertions(+), 95 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host

2014-06-29 Thread Aneesh Kumar K.V
We want to use virtual page class key protection mechanism for
indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
in the host. Those hptes will be marked valid, but have virtual page
class key set to 30 or 31. These virtual page class numbers are
configured in AMR to deny read/write. To accomodate such a change, add
new functions that map, unmap and check whether a hpte is mapped in the
host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
virtual page class keys. But we want to differentiate in the code
where we explicitly check for HPTE_V_VALID with places where we want to
check whether the hpte is host mapped. This patch enables a closer
review for such a change.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 36 
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 24 +++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 30 ++
 3 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0aa817933e6a..da00b1f05ea1 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -400,6 +400,42 @@ static inline int is_vrma_hpte(unsigned long hpte_v)
(HPTE_V_1TB_SEG | (VRMA_VSID  (40 - 16)));
 }
 
+static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm,
+   unsigned long *hpte_v,
+   unsigned long *hpte_r,
+   bool mmio)
+{
+   *hpte_v |= HPTE_V_ABSENT;
+   if (mmio)
+   *hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+}
+
+static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
+{
+   /*
+* We will never call this for MMIO
+*/
+   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+}
+
+static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
+   unsigned long *hpte_r)
+{
+   *hpte_v |= HPTE_V_VALID;
+   *hpte_v = ~HPTE_V_ABSENT;
+}
+
+static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
+{
+   unsigned long v;
+
+   v = be64_to_cpu(hpte[0]);
+   if (v  HPTE_V_VALID)
+   return true;
+   return false;
+}
+
+
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
  * Note modification of an HPTE; set the HPTE modified bit
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 590e07b1a43f..8ce5e95613f8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -752,7 +752,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
if (be64_to_cpu(hptep[0])  HPTE_V_VALID) {
/* HPTE was previously valid, so we need to invalidate it */
unlock_rmap(rmap);
-   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   /* Always mark HPTE_V_ABSENT before invalidating */
+   kvmppc_unmap_host_hpte(kvm, hptep);
kvmppc_invalidate_hpte(kvm, hptep, index);
/* don't lose previous R and C bits */
r |= be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
@@ -897,11 +898,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
/* Now check and modify the HPTE */
ptel = rev[i].guest_rpte;
psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
-   if ((be64_to_cpu(hptep[0])  HPTE_V_VALID) 
+   if (kvmppc_is_host_mapped_hpte(kvm, hptep) 
hpte_rpn(ptel, psize) == gfn) {
if (kvm-arch.using_mmu_notifiers)
-   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   kvmppc_unmap_host_hpte(kvm, hptep);
kvmppc_invalidate_hpte(kvm, hptep, i);
+
/* Harvest R and C */
rcbits = be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
*rmapp |= rcbits  KVMPPC_RMAP_RC_SHIFT;
@@ -990,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
 
/* Now check and modify the HPTE */
-   if ((be64_to_cpu(hptep[0])  HPTE_V_VALID) 
+   if (kvmppc_is_host_mapped_hpte(kvm, hptep) 
(be64_to_cpu(hptep[1])  HPTE_R_R)) {
kvmppc_clear_ref_hpte(kvm, hptep, i);
if (!(rev[i].guest_rpte  HPTE_R_R)) {
@@ -1121,11 +1123,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
}
 
/* Now check and modify the HPTE */
-   if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID)))
+   if (!kvmppc_is_host_mapped_hpte

[PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio

2014-06-29 Thread Aneesh Kumar K.V
With this patch we use AMR class 30 and 31 for indicating a page
fault that should be handled by host. This includes the MMIO access and
the page fault resulting from guest RAM swapout in the host. This
enables us to forward the fault to guest without doing the expensive
hash page table search for finding the hpte entry. With this patch, we
mark hash pte always valid and use class index 30 and 31 for key based
fault. These virtual class index are configured in AMR to deny
read/write. Since access class protection mechanism doesn't work with
VRMA region, we need to handle them separately. We mark those HPTEs
invalid and use the software defined bit, HPTE_V_VRMA, to differentiate
them.

NOTE: We still need to handle protection fault in host so that a
write to KSM shared page is handled in the host.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 80 +++-
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 48 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 69 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 52 -
 5 files changed, 194 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index a6bf41865a66..4aa9c3601fe8 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -48,7 +48,18 @@ extern unsigned long kvm_rma_pages;
  * HPTEs.
  */
 #define HPTE_V_HVLOCK  0x40UL
-#define HPTE_V_ABSENT  0x20UL
+/*
+ * VRMA mapping
+ */
+#define HPTE_V_VRMA0x20UL
+
+#define HPTE_R_HOST_UNMAP_KEY  0x3e00UL
+/*
+ * We use this to differentiate between an MMIO key fault and
+ * and a key fault resulting from host swapping out the page.
+ */
+#define HPTE_R_MMIO_UNMAP_KEY  0x3c00UL
+
 
 /*
  * We use this bit in the guest_rpte field of the revmap entry
@@ -405,35 +416,82 @@ static inline void __kvmppc_unmap_host_hpte(struct kvm 
*kvm,
unsigned long *hpte_r,
bool mmio)
 {
-   *hpte_v |= HPTE_V_ABSENT;
-   if (mmio)
-   *hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+   /*
+* We unmap on host by adding the page to AMR class 31
+* which have both read/write access denied.
+*
+* For VRMA area we mark them invalid.
+*
+* If we are not using mmu_notifiers we don't use Access
+* class protection.
+*
+* Since we are not changing the hpt directly we don't
+* Worry about update ordering.
+*/
+   if ((*hpte_v  HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers)
+   *hpte_v = ~HPTE_V_VALID;
+   else if (!mmio) {
+   *hpte_r |= HPTE_R_HOST_UNMAP_KEY;
+   *hpte_v |= HPTE_V_VALID;
+   } else {
+   *hpte_r |= HPTE_R_MMIO_UNMAP_KEY;
+   *hpte_v |= HPTE_V_VALID;
+   }
 }
 
 static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
 {
+   unsigned long pte_v, pte_r;
+
+   pte_v = be64_to_cpu(hptep[0]);
+   pte_r = be64_to_cpu(hptep[1]);
/*
 * We will never call this for MMIO
 */
-   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   __kvmppc_unmap_host_hpte(kvm, pte_v, pte_r, 0);
+   hptep[1] = cpu_to_be64(pte_r);
+   eieio();
+   hptep[0] = cpu_to_be64(pte_v);
+   asm volatile(ptesync : : : memory);
+   /*
+* we have now successfully marked the hpte using key bits
+*/
atomic_dec(kvm-arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
unsigned long *hpte_r)
 {
-   *hpte_v |= HPTE_V_VALID;
-   *hpte_v = ~HPTE_V_ABSENT;
+   /*
+* We will never try to map an MMIO region
+*/
+   if ((*hpte_v  HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers)
+   *hpte_v |= HPTE_V_VALID;
+   else {
+   /*
+* When we allow guest keys we should set this with key
+* for this page.
+*/
+   *hpte_r = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+   }
 }
 
 static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
 {
-   unsigned long v;
+   unsigned long v, r;
 
v = be64_to_cpu(hpte[0]);
-   if (v  HPTE_V_VALID)
-   return true;
-   return false;
+   if ((v  HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers)
+   return v  HPTE_V_VALID;
+
+   r = be64_to_cpu(hpte[1]);
+   if (!(v  HPTE_V_VALID))
+   return false;
+   if ((r  (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) == HPTE_R_HOST_UNMAP_KEY)
+   return false;
+   if ((r  (HPTE_R_KEY_HI | HPTE_R_KEY_LO

[PATCH 3/6] KVM: PPC: BOOK3S: HV: Remove dead code

2014-06-29 Thread Aneesh Kumar K.V
Since we do don't support virtual page class key protection mechanism in
the guest, we should not find a keyfault that needs to be forwarded to
the guest. So remove the dead code.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 9 -
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 -
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1c137f45dd55..590e07b1a43f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -499,15 +499,6 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
gpte-may_write = hpte_write_permission(pp, key);
gpte-may_execute = gpte-may_read  !(gr  (HPTE_R_N | HPTE_R_G));
 
-   /* Storage key permission check for POWER7 */
-   if (data  virtmode  cpu_has_feature(CPU_FTR_ARCH_206)) {
-   int amrfield = hpte_get_skey_perm(gr, vcpu-arch.amr);
-   if (amrfield  1)
-   gpte-may_read = 0;
-   if (amrfield  2)
-   gpte-may_write = 0;
-   }
-
/* Get the guest physical address */
gpte-raddr = kvmppc_mmu_get_real_addr(v, gr, eaddr);
return 0;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index f908845f7379..1884bff3122c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -925,15 +925,6 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned 
long addr,
return status | DSISR_PROTFAULT;
}
 
-   /* Check storage key, if applicable */
-   if (data  (vcpu-arch.shregs.msr  MSR_DR)) {
-   unsigned int perm = hpte_get_skey_perm(gr, vcpu-arch.amr);
-   if (status  DSISR_ISSTORE)
-   perm = 1;
-   if (perm  1)
-   return status | DSISR_KEYFAULT;
-   }
-
/* Save HPTE info for virtual-mode handler */
vcpu-arch.pgfault_addr = addr;
vcpu-arch.pgfault_index = index;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect

2014-06-29 Thread Aneesh Kumar K.V
This makes it consistent with h_enter where we clear the key
bits. We also want to use virtual page class key protection mechanism
for indicating host page fault. For that we will be using key class
index 30 and 31. So prevent the guest from updating key bits until
we add proper support for virtual page class protection mechanism for
the guest. This will not have any impact for PAPR linux guest because
Linux guest currently don't use virtual page class key protection model

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 157a5f35edfa..f908845f7379 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -658,13 +658,17 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned 
long flags,
}
 
v = pte;
+   /*
+* We ignore key bits here. We use class 31 and 30 for
+* hypervisor purpose. We still don't track the page
+* class seperately. Until then don't allow h_protect
+* to change key bits.
+*/
bits = (flags  55)  HPTE_R_PP0;
-   bits |= (flags  48)  HPTE_R_KEY_HI;
-   bits |= flags  (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
+   bits |= flags  (HPTE_R_PP | HPTE_R_N);
 
/* Update guest view of 2nd HPTE dword */
-   mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
-   HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+   mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N;
rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
if (rev) {
r = (rev-guest_rpte  ~mask) | bits;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page

2014-06-29 Thread Aneesh Kumar K.V
When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 6 --
 arch/powerpc/kvm/book3s_hv.c | 6 --
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 66a0a44b62a8..ca7c1688a7b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
 */
/* This covers 14..54 bits of va*/
rb = (v  ~0x7fUL)  16;   /* AVA field */
+
+   rb |= v  (62 - 8);/*  B field */
/*
 * AVA in v had cleared lower 23 bits. We need to derive
 * that from pteg index
@@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned 
long v, unsigned long r,
{
int aval_shift;
/*
-* remaining 7bits of AVA/LP fields
+* remaining bits of AVA/LP fields
 * Also contain the rr bits of LP
 */
-   rb |= (va_low  0x7f)  16;
+   rb |= (va_low  mmu_psize_defs[b_psize].shift)  0x7ff000;
/*
 * Now clear not needed LP bits based on actual psize
 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbf46eb3f59c..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct 
kvm_ppc_one_seg_page_size **sps,
(*sps)-page_shift = def-shift;
(*sps)-slb_enc = def-sllp;
(*sps)-enc[0].page_shift = def-shift;
-   /*
-* Only return base page encoding. We don't want to return
-* all the supporting pte_enc, because our H_ENTER doesn't
-* support MPSS yet. Once they do, we can start passing all
-* support pte_enc here
-*/
(*sps)-enc[0].pte_enc = def-penc[linux_psize];
/*
 * Add 16MB MPSS support if host supports it
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update

2014-06-29 Thread Aneesh Kumar K.V
As per ISA, we first need to mark hpte invalid (V=0) before we update
the hpte lower half bits. With virtual page class key protection mechanism we 
want
to send any fault other than key fault to guest directly without
searching the hash page table. But then we can get NO_HPTE fault while
we are updating the hpte. To track that add a vm specific atomic
variable that we check in the fault path to always send the fault
to host.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  1 +
 arch/powerpc/include/asm/kvm_host.h  |  1 +
 arch/powerpc/kernel/asm-offsets.c|  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 19 ++
 arch/powerpc/kvm/book3s_hv.c |  1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 40 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 60 +---
 7 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index da00b1f05ea1..a6bf41865a66 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -416,6 +416,7 @@ static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, 
__be64 *hptep)
 * We will never call this for MMIO
 */
hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   atomic_dec(kvm-arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f9ae69682ce1..0a9ff60fae4c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
spinlock_t slot_phys_lock;
cpumask_t need_tlb_flush;
+   atomic_t hpte_update_in_progress;
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
int hpt_cma_alloc;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index f5995a912213..54a36110f8f2 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -496,6 +496,7 @@ int main(void)
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+   DEFINE(KVM_HPTE_UPDATE, offsetof(struct kvm, 
arch.hpte_update_in_progress));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8ce5e95613f8..cb7a616aacb1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -592,6 +592,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
unsigned int writing, write_ok;
struct vm_area_struct *vma;
unsigned long rcbits;
+   bool hpte_invalidated = false;
 
/*
 * Real-mode code has already searched the HPT and found the
@@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = rcbits | ~(HPTE_R_R | HPTE_R_C);
 
if (be64_to_cpu(hptep[0])  HPTE_V_VALID) {
-   /* HPTE was previously valid, so we need to invalidate it */
+   /*
+* If we had mapped this hpte before, we now need to
+* invalidate that.
+*/
unlock_rmap(rmap);
-   /* Always mark HPTE_V_ABSENT before invalidating */
-   kvmppc_unmap_host_hpte(kvm, hptep);
kvmppc_invalidate_hpte(kvm, hptep, index);
/* don't lose previous R and C bits */
r |= be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
+   hpte_invalidated = true;
} else {
kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
}
@@ -765,6 +768,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
eieio();
hptep[0] = cpu_to_be64(hpte[0]);
asm volatile(ptesync : : : memory);
+   if (hpte_invalidated)
+   atomic_dec(kvm-arch.hpte_update_in_progress);
+
preempt_enable();
if (page  hpte_is_writable(r))
SetPageDirty(page);
@@ -1128,10 +1134,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
/*
 * need to make it temporarily absent so C is stable
 */
-   kvmppc_unmap_host_hpte(kvm, hptep);
-   kvmppc_invalidate_hpte(kvm, hptep, i);
v = be64_to_cpu(hptep[0]);
r = be64_to_cpu(hptep

[PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page

2014-06-27 Thread Aneesh Kumar K.V
When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 6 --
 arch/powerpc/kvm/book3s_hv.c | 6 --
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 66a0a44b62a8..ca7c1688a7b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
 */
/* This covers 14..54 bits of va*/
rb = (v  ~0x7fUL)  16;   /* AVA field */
+
+   rb |= v  (62 - 8);/*  B field */
/*
 * AVA in v had cleared lower 23 bits. We need to derive
 * that from pteg index
@@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned 
long v, unsigned long r,
{
int aval_shift;
/*
-* remaining 7bits of AVA/LP fields
+* remaining bits of AVA/LP fields
 * Also contain the rr bits of LP
 */
-   rb |= (va_low  0x7f)  16;
+   rb |= (va_low  mmu_psize_defs[b_psize].shift)  0x7ff000;
/*
 * Now clear not needed LP bits based on actual psize
 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbf46eb3f59c..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct 
kvm_ppc_one_seg_page_size **sps,
(*sps)-page_shift = def-shift;
(*sps)-slb_enc = def-sllp;
(*sps)-enc[0].page_shift = def-shift;
-   /*
-* Only return base page encoding. We don't want to return
-* all the supporting pte_enc, because our H_ENTER doesn't
-* support MPSS yet. Once they do, we can start passing all
-* support pte_enc here
-*/
(*sps)-enc[0].pte_enc = def-penc[linux_psize];
/*
 * Add 16MB MPSS support if host supports it
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page

2014-06-27 Thread Aneesh Kumar K.V
When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 6 --
 arch/powerpc/kvm/book3s_hv.c | 6 --
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 66a0a44b62a8..ca7c1688a7b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
 */
/* This covers 14..54 bits of va*/
rb = (v  ~0x7fUL)  16;   /* AVA field */
+
+   rb |= v  (62 - 8);/*  B field */
/*
 * AVA in v had cleared lower 23 bits. We need to derive
 * that from pteg index
@@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned 
long v, unsigned long r,
{
int aval_shift;
/*
-* remaining 7bits of AVA/LP fields
+* remaining bits of AVA/LP fields
 * Also contain the rr bits of LP
 */
-   rb |= (va_low  0x7f)  16;
+   rb |= (va_low  mmu_psize_defs[b_psize].shift)  0x7ff000;
/*
 * Now clear not needed LP bits based on actual psize
 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbf46eb3f59c..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct 
kvm_ppc_one_seg_page_size **sps,
(*sps)-page_shift = def-shift;
(*sps)-slb_enc = def-sllp;
(*sps)-enc[0].page_shift = def-shift;
-   /*
-* Only return base page encoding. We don't want to return
-* all the supporting pte_enc, because our H_ENTER doesn't
-* support MPSS yet. Once they do, we can start passing all
-* support pte_enc here
-*/
(*sps)-enc[0].pte_enc = def-penc[linux_psize];
/*
 * Add 16MB MPSS support if host supports it
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] KVM: PPC: Book3S HV: Fix ABIv2 on LE

2014-06-18 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 We use ABIv2 on Little Endian systems which gets rid of the dotted function
 names. Branch to the actual functions when we see such a system.

 Signed-off-by: Alexander Graf ag...@suse.de

As per patches sent by anton we don't need this. We can branch to the
function rathen than the dot symbol

http://article.gmane.org/gmane.linux.ports.ppc.embedded/68925
http://article.gmane.org/gmane.linux.ports.ppc.embedded/71005

-aneesh

 ---
  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 ++
  1 file changed, 14 insertions(+), 8 deletions(-)

 diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
 b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 index 1a71f60..1ff3ebd 100644
 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 @@ -36,6 +36,12 @@
  #define NAPPING_CEDE 1
  #define NAPPING_NOVCPU   2
  
 +#if defined(_CALL_ELF)  _CALL_ELF == 2
 +#define FUNC(name)   name
 +#else
 +#define FUNC(name)   GLUE(.,name)
 +#endif
 +
  /*
   * Call kvmppc_hv_entry in real mode.
   * Must be called with interrupts hard-disabled.
 @@ -668,9 +674,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM)
  
   mr  r31, r4
   addir3, r31, VCPU_FPRS_TM
 - bl  .load_fp_state
 + bl  FUNC(load_fp_state)
   addir3, r31, VCPU_VRS_TM
 - bl  .load_vr_state
 + bl  FUNC(load_vr_state)
   mr  r4, r31
   lwz r7, VCPU_VRSAVE_TM(r4)
   mtspr   SPRN_VRSAVE, r7
 @@ -1414,9 +1420,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM)
  
   /* Save FP/VSX. */
   addir3, r9, VCPU_FPRS_TM
 - bl  .store_fp_state
 + bl  FUNC(store_fp_state)
   addir3, r9, VCPU_VRS_TM
 - bl  .store_vr_state
 + bl  FUNC(store_vr_state)
   mfspr   r6, SPRN_VRSAVE
   stw r6, VCPU_VRSAVE_TM(r9)
  1:
 @@ -2405,11 +2411,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
   mtmsrd  r8
   isync
   addir3,r3,VCPU_FPRS
 - bl  .store_fp_state
 + bl  FUNC(store_fp_state)
  #ifdef CONFIG_ALTIVEC
  BEGIN_FTR_SECTION
   addir3,r31,VCPU_VRS
 - bl  .store_vr_state
 + bl  FUNC(store_vr_state)
  END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
  #endif
   mfspr   r6,SPRN_VRSAVE
 @@ -2441,11 +2447,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
   mtmsrd  r8
   isync
   addir3,r4,VCPU_FPRS
 - bl  .load_fp_state
 + bl  FUNC(load_fp_state)
  #ifdef CONFIG_ALTIVEC
  BEGIN_FTR_SECTION
   addir3,r31,VCPU_VRS
 - bl  .load_vr_state
 + bl  FUNC(load_vr_state)
  END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
  #endif
   lwz r7,VCPU_VRSAVE(r31)

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-16 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 On Sat, Jun 14, 2014 at 02:23:59PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Now, we have general CMA reserved area management framework,
  so use it for future maintainabilty. There is no functional change.
 
  Acked-by: Michal Nazarewicz min...@mina86.com
  Acked-by: Paolo Bonzini pbonz...@redhat.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Need this. We may want to keep the VM_BUG_ON by moving
 KVM_CMA_CHUNK_ORDER around.
 
 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index 8056107..1932e0e 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -37,8 +37,6 @@
  #include asm/ppc-opcode.h
  #include asm/cputable.h
  
 -#include book3s_hv_cma.h
 -
  /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
  #define MAX_LPID_970   63
  
 @@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 }
  
 kvm-arch.hpt_cma_alloc = 0;
 -   VM_BUG_ON(order  KVM_CMA_CHUNK_ORDER);
 page = kvm_alloc_hpt(1  (order - PAGE_SHIFT));
 if (page) {
 hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
 
 
 
 -aneesh

 Okay.
 So do you also want this?

 @@ -131,16 +135,18 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
 unsigned long align_pages = HPT_ALIGN_PAGES;

 +   VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 +
 /* Old CPUs require HPT aligned on a multiple of its size */
 if (!cpu_has_feature(CPU_FTR_ARCH_206))
 align_pages = nr_pages;
 -   return kvm_alloc_cma(nr_pages, align_pages);
 +   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
  }

That would also work.

Thanks
-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code

2014-06-16 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 On Sat, Jun 14, 2014 at 12:55:39PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Currently, there are two users on CMA functionality, one is the DMA
  subsystem and the other is the kvm on powerpc. They have their own code
  to manage CMA reserved area even if they looks really similar.
  From my guess, it is caused by some needs on bitmap management. Kvm side
  wants to maintain bitmap not for 1 page, but for more size. Eventually it
  use bitmap where one bit represents 64 pages.
 
  When I implement CMA related patches, I should change those two places
  to apply my change and it seem to be painful to me. I want to change
  this situation and reduce future code management overhead through
  this patch.
 
  This change could also help developer who want to use CMA in their
  new feature development, since they can use CMA easily without
  copying  pasting this reserved area management code.
 
  v2:
Although this patchset looks very different with v1, the end result,
that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.
 
  Patch 1-5 prepare some features to cover ppc kvm's requirements.
  Patch 6-7 generalize CMA reserved area management code and change users
  to use it.
  Patch 8-10 clean-up minor things.
 
 
 I wanted to test the ppc changes and found that the patch series doesn't 
 apply
 against v3.15 . Do you have a kernel tree which I can clone to test this
 series ?

 This is based on linux-next -next-20140610.
 And my tree is on following link.

 https://github.com/JoonsooKim/linux/tree/cma-general-v2.0-next-20140610

 But, I think I'm late, because you have already added a Tested-by tag.

linux-next kexec is broken on ppc64, hence I hand picked few of
dependent patches for dma CMA on top of 3.15 and used that for testing.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-16 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 On Sat, Jun 14, 2014 at 02:23:59PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Now, we have general CMA reserved area management framework,
  so use it for future maintainabilty. There is no functional change.
 
  Acked-by: Michal Nazarewicz min...@mina86.com
  Acked-by: Paolo Bonzini pbonz...@redhat.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Need this. We may want to keep the VM_BUG_ON by moving
 KVM_CMA_CHUNK_ORDER around.
 
 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index 8056107..1932e0e 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -37,8 +37,6 @@
  #include asm/ppc-opcode.h
  #include asm/cputable.h
  
 -#include book3s_hv_cma.h
 -
  /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
  #define MAX_LPID_970   63
  
 @@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 }
  
 kvm-arch.hpt_cma_alloc = 0;
 -   VM_BUG_ON(order  KVM_CMA_CHUNK_ORDER);
 page = kvm_alloc_hpt(1  (order - PAGE_SHIFT));
 if (page) {
 hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
 
 
 
 -aneesh

 Okay.
 So do you also want this?

 @@ -131,16 +135,18 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
  {
 unsigned long align_pages = HPT_ALIGN_PAGES;

 +   VM_BUG_ON(get_order(nr_pages)  KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 +
 /* Old CPUs require HPT aligned on a multiple of its size */
 if (!cpu_has_feature(CPU_FTR_ARCH_206))
 align_pages = nr_pages;
 -   return kvm_alloc_cma(nr_pages, align_pages);
 +   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
  }

That would also work.

Thanks
-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code

2014-06-16 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 On Sat, Jun 14, 2014 at 12:55:39PM +0530, Aneesh Kumar K.V wrote:
 Joonsoo Kim iamjoonsoo@lge.com writes:
 
  Currently, there are two users on CMA functionality, one is the DMA
  subsystem and the other is the kvm on powerpc. They have their own code
  to manage CMA reserved area even if they looks really similar.
  From my guess, it is caused by some needs on bitmap management. Kvm side
  wants to maintain bitmap not for 1 page, but for more size. Eventually it
  use bitmap where one bit represents 64 pages.
 
  When I implement CMA related patches, I should change those two places
  to apply my change and it seem to be painful to me. I want to change
  this situation and reduce future code management overhead through
  this patch.
 
  This change could also help developer who want to use CMA in their
  new feature development, since they can use CMA easily without
  copying  pasting this reserved area management code.
 
  v2:
Although this patchset looks very different with v1, the end result,
that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.
 
  Patch 1-5 prepare some features to cover ppc kvm's requirements.
  Patch 6-7 generalize CMA reserved area management code and change users
  to use it.
  Patch 8-10 clean-up minor things.
 
 
 I wanted to test the ppc changes and found that the patch series doesn't 
 apply
 against v3.15 . Do you have a kernel tree which I can clone to test this
 series ?

 This is based on linux-next -next-20140610.
 And my tree is on following link.

 https://github.com/JoonsooKim/linux/tree/cma-general-v2.0-next-20140610

 But, I think I'm late, because you have already added a Tested-by tag.

linux-next kexec is broken on ppc64, hence I hand picked few of
dependent patches for dma CMA on top of 3.15 and used that for testing.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-15 Thread Aneesh Kumar K.V
With guests supporting Multiple page size per segment (MPSS),
hpte_page_size returns the actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB. Without this patch a hpte lookup can fail since
we are comparing wrong page size in kvmppc_hv_find_lock_hpte.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Remove obsolete comment from the code
* Update commit message

 arch/powerpc/include/asm/kvm_book3s_64.h | 19 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  7 ++-
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 34422be566ce..3d0f3fb9c6b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
 }
 
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+bool is_base_size)
 {
+
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l  LP_SHIFT)  ((1  LP_BITS) - 1);
@@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
continue;
 
a_psize = __hpte_actual_psize(lp, size);
-   if (a_psize != -1)
+   if (a_psize != -1) {
+   if (is_base_size)
+   return 1ul  
mmu_psize_defs[size].shift;
return 1ul  mmu_psize_defs[a_psize].shift;
+   }
}
 
}
return 0;
 }
 
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+   return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long 
l)
+{
+   return __hpte_page_size(h, l, 1);
+}
+
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
return ((ptel  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f53cf2eae36a..7ff45ed27c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
goto out;
}
if (!rma_setup  is_vrma_hpte(v)) {
-   unsigned long psize = hpte_page_size(v, r);
+   unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 87624ab5ba82..d86356bfc970 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -833,13 +833,10 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t 
eaddr, unsigned long slb_v,
r = be64_to_cpu(hpte[i+1]);
 
/*
-* Check the HPTE again, including large page size
-* Since we don't currently allow any MPSS (mixed
-* page-size segment) page sizes, it is sufficient
-* to check against the actual page size.
+* Check the HPTE again, including base page size
 */
if ((v  valid)  (v  mask) == val 
-   hpte_page_size(v, r) == (1ul  pshift))
+   hpte_base_page_size(v, r) == (1ul  pshift))
/* Return with the HPTE still locked */
return (hash  3) + (i  1);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] KVM: PPC: Book3S HV: Enable on little endian hosts

2014-06-15 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 So far we've been able to successfully run HV KVM on big endian hosts, but
 once you dive into little endian land things start to fall apart.

 This patch set enables HV KVM for little endian hosts. This should be the
 final piece left missing to get little endian systems fully en par with big
 endian ones in the KVM world.

Can we also use types like __be64 where we need them ? That will also
make sure that tools like sparse can catch errors ?

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-15 Thread Aneesh Kumar K.V
With guests supporting Multiple page size per segment (MPSS),
hpte_page_size returns the actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB. Without this patch a hpte lookup can fail since
we are comparing wrong page size in kvmppc_hv_find_lock_hpte.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* Remove obsolete comment from the code
* Update commit message

 arch/powerpc/include/asm/kvm_book3s_64.h | 19 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  7 ++-
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 34422be566ce..3d0f3fb9c6b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
 }
 
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+bool is_base_size)
 {
+
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l  LP_SHIFT)  ((1  LP_BITS) - 1);
@@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
continue;
 
a_psize = __hpte_actual_psize(lp, size);
-   if (a_psize != -1)
+   if (a_psize != -1) {
+   if (is_base_size)
+   return 1ul  
mmu_psize_defs[size].shift;
return 1ul  mmu_psize_defs[a_psize].shift;
+   }
}
 
}
return 0;
 }
 
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+   return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long 
l)
+{
+   return __hpte_page_size(h, l, 1);
+}
+
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
return ((ptel  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f53cf2eae36a..7ff45ed27c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
goto out;
}
if (!rma_setup  is_vrma_hpte(v)) {
-   unsigned long psize = hpte_page_size(v, r);
+   unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 87624ab5ba82..d86356bfc970 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -833,13 +833,10 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t 
eaddr, unsigned long slb_v,
r = be64_to_cpu(hpte[i+1]);
 
/*
-* Check the HPTE again, including large page size
-* Since we don't currently allow any MPSS (mixed
-* page-size segment) page sizes, it is sufficient
-* to check against the actual page size.
+* Check the HPTE again, including base page size
 */
if ((v  valid)  (v  mask) == val 
-   hpte_page_size(v, r) == (1ul  pshift))
+   hpte_base_page_size(v, r) == (1ul  pshift))
/* Return with the HPTE still locked */
return (hash  3) + (i  1);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] KVM: PPC: Book3S HV: Enable on little endian hosts

2014-06-15 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 So far we've been able to successfully run HV KVM on big endian hosts, but
 once you dive into little endian land things start to fall apart.

 This patch set enables HV KVM for little endian hosts. This should be the
 final piece left missing to get little endian systems fully en par with big
 endian ones in the KVM world.

Can we also use types like __be64 where we need them ? That will also
make sure that tools like sparse can catch errors ?

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/10] mm, cma: clean-up cma allocation error path

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 We can remove one call sites for clear_cma_bitmap() if we first
 call it before checking error number.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/mm/cma.c b/mm/cma.c
 index 1e1b017..01a0713 100644
 --- a/mm/cma.c
 +++ b/mm/cma.c
 @@ -282,11 +282,12 @@ struct page *cma_alloc(struct cma *cma, int count, 
 unsigned int align)
   if (ret == 0) {
   page = pfn_to_page(pfn);
   break;
 - } else if (ret != -EBUSY) {
 - clear_cma_bitmap(cma, pfn, count);
 - break;
   }
 +
   clear_cma_bitmap(cma, pfn, count);
 + if (ret != -EBUSY)
 + break;
 +
   pr_debug(%s(): memory range at %p is busy, retrying\n,
__func__, pfn_to_page(pfn));
   /* try again with a bit different memory target */
 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/10] mm, cma: move output param to the end of param list

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Conventionally, we put output param to the end of param list.
 cma_declare_contiguous() doesn't look like that, so change it.

 Additionally, move down cma_areas reference code to the position
 where it is really needed.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 28ec226..97613ea 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -184,7 +184,7 @@ void __init kvm_cma_reserve(void)

   align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
   cma_declare_contiguous(selected_size, 0, 0, align_size,
 - KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
 + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, kvm_cma);
   }
  }

 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index f177f73..bfd4553 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -149,7 +149,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,
  {
   int ret;

 - ret = cma_declare_contiguous(size, base, limit, 0, 0, res_cma, fixed);
 + ret = cma_declare_contiguous(size, base, limit, 0, 0, fixed, res_cma);
   if (ret)
   return ret;

 diff --git a/include/linux/cma.h b/include/linux/cma.h
 index e38efe9..e53eead 100644
 --- a/include/linux/cma.h
 +++ b/include/linux/cma.h
 @@ -6,7 +6,7 @@ struct cma;
  extern int __init cma_declare_contiguous(phys_addr_t size,
   phys_addr_t base, phys_addr_t limit,
   phys_addr_t alignment, int order_per_bit,
 - struct cma **res_cma, bool fixed);
 + bool fixed, struct cma **res_cma);
  extern struct page *cma_alloc(struct cma *cma, int count, unsigned int 
 align);
  extern bool cma_release(struct cma *cma, struct page *pages, int count);
  #endif
 diff --git a/mm/cma.c b/mm/cma.c
 index 01a0713..22a5b23 100644
 --- a/mm/cma.c
 +++ b/mm/cma.c
 @@ -142,8 +142,8 @@ core_initcall(cma_init_reserved_areas);
   * @limit: End address of the reserved memory (optional, 0 for any).
   * @alignment: Alignment for the contiguous memory area, should be power of 2
   * @order_per_bit: Order of pages represented by one bit on bitmap.
 - * @res_cma: Pointer to store the created cma region.
   * @fixed: hint about where to place the reserved area
 + * @res_cma: Pointer to store the created cma region.
   *
   * This function reserves memory from early allocator. It should be
   * called by arch specific code once the early allocator (memblock or 
 bootmem)
 @@ -156,9 +156,9 @@ core_initcall(cma_init_reserved_areas);
  int __init cma_declare_contiguous(phys_addr_t size,
   phys_addr_t base, phys_addr_t limit,
   phys_addr_t alignment, int order_per_bit,
 - struct cma **res_cma, bool fixed)
 + bool fixed, struct cma **res_cma)
  {
 - struct cma *cma = cma_areas[cma_area_count];
 + struct cma *cma;
   int ret = 0;

   pr_debug(%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n,
 @@ -214,6 +214,7 @@ int __init cma_declare_contiguous(phys_addr_t size,
* Each reserved area must be initialised later, when more kernel
* subsystems (like slab allocator) are available.
*/
 + cma = cma_areas[cma_area_count];
   cma-base_pfn = PFN_DOWN(base);
   cma-count = size  PAGE_SHIFT;
   cma-order_per_bit = order_per_bit;
 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the kvm on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.

 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.

 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.

 v2:
   Although this patchset looks very different with v1, the end result,
   that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.

 Patch 1-5 prepare some features to cover ppc kvm's requirements.
 Patch 6-7 generalize CMA reserved area management code and change users
 to use it.
 Patch 8-10 clean-up minor things.


I wanted to test the ppc changes and found that the patch series doesn't apply
against v3.15 . Do you have a kernel tree which I can clone to test this
series ?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Now, we have general CMA reserved area management framework,
 so use it for future maintainabilty. There is no functional change.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Acked-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Need this. We may want to keep the VM_BUG_ON by moving
KVM_CMA_CHUNK_ORDER around.

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8056107..1932e0e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -37,8 +37,6 @@
 #include asm/ppc-opcode.h
 #include asm/cputable.h
 
-#include book3s_hv_cma.h
-
 /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
 #define MAX_LPID_970   63
 
@@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
kvm-arch.hpt_cma_alloc = 0;
-   VM_BUG_ON(order  KVM_CMA_CHUNK_ORDER);
page = kvm_alloc_hpt(1  (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));



-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Now, we have general CMA reserved area management framework,
 so use it for future maintainabilty. There is no functional change.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Acked-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 7cde8a6..28ec226 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -16,12 +16,14 @@
  #include linux/init.h
  #include linux/memblock.h
  #include linux/sizes.h
 +#include linux/cma.h

  #include asm/cputable.h
  #include asm/kvm_ppc.h
  #include asm/kvm_book3s.h

 -#include book3s_hv_cma.h
 +#define KVM_CMA_CHUNK_ORDER  18
 +
  /*
   * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206)
   * should be power of 2.
 @@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5;
  unsigned long kvm_rma_pages = (1  27)  PAGE_SHIFT;   /* 128MB */
  EXPORT_SYMBOL_GPL(kvm_rma_pages);

 +static struct cma *kvm_cma;
 +
  /* Work out RMLS (real mode limit selector) field value for a given RMA size.
 Assumes POWER7 or PPC970. */
  static inline int lpcr_rmls(unsigned long rma_size)
 @@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages);
 + page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma);
  void kvm_release_rma(struct kvm_rma_info *ri)
  {
   if (atomic_dec_and_test(ri-use_count)) {
 - kvm_release_cma(pfn_to_page(ri-base_pfn), kvm_rma_pages);
 + cma_release(kvm_cma, pfn_to_page(ri-base_pfn), kvm_rma_pages);
   kfree(ri);
   }
  }
 @@ -134,13 +138,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return kvm_alloc_cma(nr_pages, align_pages);
 + return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

  void kvm_release_hpt(struct page *page, unsigned long nr_pages)
  {
 - kvm_release_cma(page, nr_pages);
 + cma_release(kvm_cma, page, nr_pages);
  }
  EXPORT_SYMBOL_GPL(kvm_release_hpt);

 @@ -179,7 +183,8 @@ void __init kvm_cma_reserve(void)
   align_size = HPT_ALIGN_PAGES  PAGE_SHIFT;

   align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
 - kvm_cma_declare_contiguous(selected_size, align_size);
 + cma_declare_contiguous(selected_size, 0, 0, align_size,
 + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
   }
  }

 diff --git a/arch/powerpc/kvm/book3s_hv_cma.c 
 b/arch/powerpc/kvm/book3s_hv_cma.c
 deleted file mode 100644
 index d9d3d85..000
 --- a/arch/powerpc/kvm/book3s_hv_cma.c
 +++ /dev/null
 @@ -1,240 +0,0 @@
 -/*
 - * Contiguous Memory Allocator for ppc KVM hash pagetable  based on CMA
 - * for DMA mapping framework
 - *
 - * Copyright IBM Corporation, 2013
 - * Author Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 - *
 - * This program is free software; you can redistribute it and/or
 - * modify it under the terms of the GNU General Public License as
 - * published by the Free Software Foundation; either version 2 of the
 - * License or (at your optional) any later version of the license.
 - *
 - */
 -#define pr_fmt(fmt) kvm_cma:  fmt
 -
 -#ifdef CONFIG_CMA_DEBUG
 -#ifndef DEBUG
 -#  define DEBUG
 -#endif
 -#endif
 -
 -#include linux/memblock.h
 -#include linux/mutex.h
 -#include linux/sizes.h
 -#include linux/slab.h
 -
 -#include book3s_hv_cma.h
 -
 -struct kvm_cma {
 - unsigned long   base_pfn;
 - unsigned long   count;
 - unsigned long   *bitmap;
 -};
 -
 -static DEFINE_MUTEX(kvm_cma_mutex);
 -static struct kvm_cma kvm_cma_area;
 -
 -/**
 - * kvm_cma_declare_contiguous() - reserve area for contiguous memory handling
 - * for kvm hash pagetable
 - * @size:  Size of the reserved memory.
 - * @alignment:  Alignment for the contiguous memory area
 - *
 - * This function reserves memory for kvm cma area. It should be
 - * called by arch code when early allocator (memblock or bootmem)
 - * is still activate.
 - */
 -long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t 
 alignment)
 -{
 - long base_pfn;
 - phys_addr_t addr;
 - struct kvm_cma *cma = kvm_cma_area;
 -
 - pr_debug(%s(size %lx)\n, __func__, (unsigned long)size);
 -
 - if (!size)
 - return -EINVAL;
 - /*
 -  * Sanitise input arguments.
 -  * We should be pageblock aligned for CMA

Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the kvm on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.

 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.

 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.

 In previous patches, we have prepared some features to generalize
 CMA reserved area management and now it's time to do it. This patch
 moves core functions to mm/cma.c and change DMA APIs to use
 these functions.

 There is no functional change in DMA APIs.

 v2: There is no big change from v1 in mm/cma.c. Mostly renaming.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 00e13ce..4eac559 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -283,16 +283,6 @@ config CMA_ALIGNMENT

 If unsure, leave the default value 8.

 -config CMA_AREAS
 - int Maximum count of the CMA device-private areas
 - default 7
 - help
 -   CMA allows to create CMA areas for particular devices. This parameter
 -   sets the maximum number of such device private CMA areas in the
 -   system.
 -
 -   If unsure, leave the default value 7.
 -
  endif

  endmenu
 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index 9bc9340..f177f73 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -24,25 +24,10 @@

  #include linux/memblock.h
  #include linux/err.h
 -#include linux/mm.h
 -#include linux/mutex.h
 -#include linux/page-isolation.h
  #include linux/sizes.h
 -#include linux/slab.h
 -#include linux/swap.h
 -#include linux/mm_types.h
  #include linux/dma-contiguous.h
  #include linux/log2.h
 -
 -struct cma {
 - unsigned long   base_pfn;
 - unsigned long   count;
 - unsigned long   *bitmap;
 - int order_per_bit; /* Order of pages represented by one bit */
 - struct mutexlock;
 -};
 -
 -struct cma *dma_contiguous_default_area;
 +#include linux/cma.h

  #ifdef CONFIG_CMA_SIZE_MBYTES
  #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
 @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area;
  #define CMA_SIZE_MBYTES 0
  #endif

 +struct cma *dma_contiguous_default_area;
 +
  /*
   * Default global CMA area size can be defined in kernel's .config.
   * This is useful mainly for distro maintainers to create a kernel
 @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
   }
  }

 -static DEFINE_MUTEX(cma_mutex);
 -
 -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int 
 align_order)
 -{
 - return (1  (align_order  cma-order_per_bit)) - 1;
 -}
 -
 -static unsigned long cma_bitmap_maxno(struct cma *cma)
 -{
 - return cma-count  cma-order_per_bit;
 -}
 -
 -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
 - unsigned long pages)
 -{
 - return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
 -}
 -
 -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 -{
 - unsigned long bitmapno, nr_bits;
 -
 - bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
 - nr_bits = cma_bitmap_pages_to_bits(cma, count);
 -
 - mutex_lock(cma-lock);
 - bitmap_clear(cma-bitmap, bitmapno, nr_bits);
 - mutex_unlock(cma-lock);
 -}
 -
 -static int __init cma_activate_area(struct cma *cma)
 -{
 - int bitmap_maxno = cma_bitmap_maxno(cma);
 - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
 - unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
 - unsigned i = cma-count  pageblock_order;
 - struct zone *zone;
 -
 - pr_debug(%s()\n, __func__);
 -
 - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
 - if (!cma-bitmap)
 - return -ENOMEM;
 -
 - WARN_ON_ONCE(!pfn_valid(pfn));
 - zone = page_zone(pfn_to_page(pfn));
 -
 - do {
 - unsigned j;
 - base_pfn = pfn;
 - for (j = pageblock_nr_pages; j; --j, pfn++) {
 - WARN_ON_ONCE(!pfn_valid(pfn));
 - /*
 -  * alloc_contig_range requires the pfn range
 -  * specified to be in the same zone. Make this
 -  * simple by forcing

Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the kvm on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.

 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.

 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.

 In previous patches, we have prepared some features to generalize
 CMA reserved area management and now it's time to do it. This patch
 moves core functions to mm/cma.c and change DMA APIs to use
 these functions.

 There is no functional change in DMA APIs.

 v2: There is no big change from v1 in mm/cma.c. Mostly renaming.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 00e13ce..4eac559 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -283,16 +283,6 @@ config CMA_ALIGNMENT

 If unsure, leave the default value 8.

 -config CMA_AREAS
 - int Maximum count of the CMA device-private areas
 - default 7
 - help
 -   CMA allows to create CMA areas for particular devices. This parameter
 -   sets the maximum number of such device private CMA areas in the
 -   system.
 -
 -   If unsure, leave the default value 7.
 -
  endif

  endmenu
 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index 9bc9340..f177f73 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -24,25 +24,10 @@

  #include linux/memblock.h
  #include linux/err.h
 -#include linux/mm.h
 -#include linux/mutex.h
 -#include linux/page-isolation.h
  #include linux/sizes.h
 -#include linux/slab.h
 -#include linux/swap.h
 -#include linux/mm_types.h
  #include linux/dma-contiguous.h
  #include linux/log2.h
 -
 -struct cma {
 - unsigned long   base_pfn;
 - unsigned long   count;
 - unsigned long   *bitmap;
 - int order_per_bit; /* Order of pages represented by one bit */
 - struct mutexlock;
 -};
 -
 -struct cma *dma_contiguous_default_area;
 +#include linux/cma.h

  #ifdef CONFIG_CMA_SIZE_MBYTES
  #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
 @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area;
  #define CMA_SIZE_MBYTES 0
  #endif

 +struct cma *dma_contiguous_default_area;
 +
  /*
   * Default global CMA area size can be defined in kernel's .config.
   * This is useful mainly for distro maintainers to create a kernel
 @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
   }
  }

 -static DEFINE_MUTEX(cma_mutex);
 -
 -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int 
 align_order)
 -{
 - return (1  (align_order  cma-order_per_bit)) - 1;
 -}
 -
 -static unsigned long cma_bitmap_maxno(struct cma *cma)
 -{
 - return cma-count  cma-order_per_bit;
 -}
 -
 -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
 - unsigned long pages)
 -{
 - return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
 -}
 -
 -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 -{
 - unsigned long bitmapno, nr_bits;
 -
 - bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
 - nr_bits = cma_bitmap_pages_to_bits(cma, count);
 -
 - mutex_lock(cma-lock);
 - bitmap_clear(cma-bitmap, bitmapno, nr_bits);
 - mutex_unlock(cma-lock);
 -}
 -
 -static int __init cma_activate_area(struct cma *cma)
 -{
 - int bitmap_maxno = cma_bitmap_maxno(cma);
 - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
 - unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
 - unsigned i = cma-count  pageblock_order;
 - struct zone *zone;
 -
 - pr_debug(%s()\n, __func__);
 -
 - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
 - if (!cma-bitmap)
 - return -ENOMEM;
 -
 - WARN_ON_ONCE(!pfn_valid(pfn));
 - zone = page_zone(pfn_to_page(pfn));
 -
 - do {
 - unsigned j;
 - base_pfn = pfn;
 - for (j = pageblock_nr_pages; j; --j, pfn++) {
 - WARN_ON_ONCE(!pfn_valid(pfn));
 - /*
 -  * alloc_contig_range requires the pfn range
 -  * specified to be in the same zone. Make this
 -  * simple by forcing

Re: [PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 ppc kvm's cma region management requires arbitrary bitmap granularity,
 since they want to reserve very large memory and manage this region
 with bitmap that one bit for several pages to reduce management overheads.
 So support arbitrary bitmap granularity for following generalization.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index bc4c171..9bc9340 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -38,6 +38,7 @@ struct cma {
   unsigned long   base_pfn;
   unsigned long   count;
   unsigned long   *bitmap;
 + int order_per_bit; /* Order of pages represented by one bit */
   struct mutexlock;
  };

 @@ -157,9 +158,38 @@ void __init dma_contiguous_reserve(phys_addr_t limit)

  static DEFINE_MUTEX(cma_mutex);

 +static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int 
 align_order)
 +{
 + return (1  (align_order  cma-order_per_bit)) - 1;
 +}
 +
 +static unsigned long cma_bitmap_maxno(struct cma *cma)
 +{
 + return cma-count  cma-order_per_bit;
 +}
 +
 +static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
 + unsigned long pages)
 +{
 + return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
 +}
 +
 +static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 +{
 + unsigned long bitmapno, nr_bits;
 +
 + bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
 + nr_bits = cma_bitmap_pages_to_bits(cma, count);
 +
 + mutex_lock(cma-lock);
 + bitmap_clear(cma-bitmap, bitmapno, nr_bits);
 + mutex_unlock(cma-lock);
 +}
 +
  static int __init cma_activate_area(struct cma *cma)
  {
 - int bitmap_size = BITS_TO_LONGS(cma-count) * sizeof(long);
 + int bitmap_maxno = cma_bitmap_maxno(cma);
 + int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
   unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
   unsigned i = cma-count  pageblock_order;
   struct zone *zone;
 @@ -221,6 +251,7 @@ core_initcall(cma_init_reserved_areas);
   * @base: Base address of the reserved area optional, use 0 for any
   * @limit: End address of the reserved memory (optional, 0 for any).
   * @alignment: Alignment for the contiguous memory area, should be power of 2
 + * @order_per_bit: Order of pages represented by one bit on bitmap.
   * @res_cma: Pointer to store the created cma region.
   * @fixed: hint about where to place the reserved area
   *
 @@ -235,7 +266,7 @@ core_initcall(cma_init_reserved_areas);
   */
  static int __init __dma_contiguous_reserve_area(phys_addr_t size,
   phys_addr_t base, phys_addr_t limit,
 - phys_addr_t alignment,
 + phys_addr_t alignment, int order_per_bit,
   struct cma **res_cma, bool fixed)
  {
   struct cma *cma = cma_areas[cma_area_count];
 @@ -269,6 +300,8 @@ static int __init 
 __dma_contiguous_reserve_area(phys_addr_t size,
   base = ALIGN(base, alignment);
   size = ALIGN(size, alignment);
   limit = ~(alignment - 1);
 + /* size should be aligned with order_per_bit */
 + BUG_ON(!IS_ALIGNED(size  PAGE_SHIFT, 1  order_per_bit));

   /* Reserve memory */
   if (base  fixed) {
 @@ -294,6 +327,7 @@ static int __init 
 __dma_contiguous_reserve_area(phys_addr_t size,
*/
   cma-base_pfn = PFN_DOWN(base);
   cma-count = size  PAGE_SHIFT;
 + cma-order_per_bit = order_per_bit;
   *res_cma = cma;
   cma_area_count++;

 @@ -313,7 +347,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,
  {
   int ret;

 - ret = __dma_contiguous_reserve_area(size, base, limit, 0,
 + ret = __dma_contiguous_reserve_area(size, base, limit, 0, 0,
   res_cma, fixed);
   if (ret)
   return ret;
 @@ -324,13 +358,6 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,
   return 0;
  }

 -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 -{
 - mutex_lock(cma-lock);
 - bitmap_clear(cma-bitmap, pfn - cma-base_pfn, count);
 - mutex_unlock(cma-lock);
 -}
 -
  /**
   * dma_alloc_from_contiguous() - allocate pages from contiguous area
   * @dev:   Pointer to device for which the allocation is performed.
 @@ -345,7 +372,8 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
 long pfn, int count)
  static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
  unsigned int align)
  {
 - unsigned long mask, pfn, pageno, start = 0;
 + unsigned long mask, pfn, start = 0;
 + unsigned long bitmap_maxno, bitmapno, nr_bits;
   struct page *page

Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the kvm on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.

 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.

 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.

 In previous patches, we have prepared some features to generalize
 CMA reserved area management and now it's time to do it. This patch
 moves core functions to mm/cma.c and change DMA APIs to use
 these functions.

 There is no functional change in DMA APIs.

 v2: There is no big change from v1 in mm/cma.c. Mostly renaming.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com


.

 +
 + mask = cma_bitmap_aligned_mask(cma, align);
 + bitmap_maxno = cma_bitmap_maxno(cma);
 + nr_bits = cma_bitmap_pages_to_bits(cma, count);
 +
 + for (;;) {
 + mutex_lock(cma-lock);
 + bitmapno = bitmap_find_next_zero_area(cma-bitmap,
 + bitmap_maxno, start, nr_bits, mask);
 + if (bitmapno = bitmap_maxno) {
 + mutex_unlock(cma-lock);
 + break;
 + }
 + bitmap_set(cma-bitmap, bitmapno, nr_bits);
 + /*
 +  * It's safe to drop the lock here. We've marked this region for
 +  * our exclusive use. If the migration fails we will take the
 +  * lock again and unmark it.
 +  */
 + mutex_unlock(cma-lock);
 +
 + pfn = cma-base_pfn + (bitmapno  cma-order_per_bit);
 + mutex_lock(cma_mutex);
 + ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
 + mutex_unlock(cma_mutex);
 + if (ret == 0) {
 + page = pfn_to_page(pfn);
 + break;
 + } else if (ret != -EBUSY) {
 + clear_cma_bitmap(cma, pfn, count);
 + break;
 + }
 + 


For setting bit map we do
bitmap_set(cma-bitmap, bitmapno, nr_bits);
alloc_contig()..
if (error)
clear_cma_bitmap(cma, pfn, count);

Why ?

why not bitmap_clear() ?

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/10] mm, cma: clean-up cma allocation error path

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 We can remove one call sites for clear_cma_bitmap() if we first
 call it before checking error number.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/mm/cma.c b/mm/cma.c
 index 1e1b017..01a0713 100644
 --- a/mm/cma.c
 +++ b/mm/cma.c
 @@ -282,11 +282,12 @@ struct page *cma_alloc(struct cma *cma, int count, 
 unsigned int align)
   if (ret == 0) {
   page = pfn_to_page(pfn);
   break;
 - } else if (ret != -EBUSY) {
 - clear_cma_bitmap(cma, pfn, count);
 - break;
   }
 +
   clear_cma_bitmap(cma, pfn, count);
 + if (ret != -EBUSY)
 + break;
 +
   pr_debug(%s(): memory range at %p is busy, retrying\n,
__func__, pfn_to_page(pfn));
   /* try again with a bit different memory target */
 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/10] mm, cma: move output param to the end of param list

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Conventionally, we put output param to the end of param list.
 cma_declare_contiguous() doesn't look like that, so change it.

 Additionally, move down cma_areas reference code to the position
 where it is really needed.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 28ec226..97613ea 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -184,7 +184,7 @@ void __init kvm_cma_reserve(void)

   align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
   cma_declare_contiguous(selected_size, 0, 0, align_size,
 - KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
 + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, kvm_cma);
   }
  }

 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index f177f73..bfd4553 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -149,7 +149,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,
  {
   int ret;

 - ret = cma_declare_contiguous(size, base, limit, 0, 0, res_cma, fixed);
 + ret = cma_declare_contiguous(size, base, limit, 0, 0, fixed, res_cma);
   if (ret)
   return ret;

 diff --git a/include/linux/cma.h b/include/linux/cma.h
 index e38efe9..e53eead 100644
 --- a/include/linux/cma.h
 +++ b/include/linux/cma.h
 @@ -6,7 +6,7 @@ struct cma;
  extern int __init cma_declare_contiguous(phys_addr_t size,
   phys_addr_t base, phys_addr_t limit,
   phys_addr_t alignment, int order_per_bit,
 - struct cma **res_cma, bool fixed);
 + bool fixed, struct cma **res_cma);
  extern struct page *cma_alloc(struct cma *cma, int count, unsigned int 
 align);
  extern bool cma_release(struct cma *cma, struct page *pages, int count);
  #endif
 diff --git a/mm/cma.c b/mm/cma.c
 index 01a0713..22a5b23 100644
 --- a/mm/cma.c
 +++ b/mm/cma.c
 @@ -142,8 +142,8 @@ core_initcall(cma_init_reserved_areas);
   * @limit: End address of the reserved memory (optional, 0 for any).
   * @alignment: Alignment for the contiguous memory area, should be power of 2
   * @order_per_bit: Order of pages represented by one bit on bitmap.
 - * @res_cma: Pointer to store the created cma region.
   * @fixed: hint about where to place the reserved area
 + * @res_cma: Pointer to store the created cma region.
   *
   * This function reserves memory from early allocator. It should be
   * called by arch specific code once the early allocator (memblock or 
 bootmem)
 @@ -156,9 +156,9 @@ core_initcall(cma_init_reserved_areas);
  int __init cma_declare_contiguous(phys_addr_t size,
   phys_addr_t base, phys_addr_t limit,
   phys_addr_t alignment, int order_per_bit,
 - struct cma **res_cma, bool fixed)
 + bool fixed, struct cma **res_cma)
  {
 - struct cma *cma = cma_areas[cma_area_count];
 + struct cma *cma;
   int ret = 0;

   pr_debug(%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n,
 @@ -214,6 +214,7 @@ int __init cma_declare_contiguous(phys_addr_t size,
* Each reserved area must be initialised later, when more kernel
* subsystems (like slab allocator) are available.
*/
 + cma = cma_areas[cma_area_count];
   cma-base_pfn = PFN_DOWN(base);
   cma-count = size  PAGE_SHIFT;
   cma-order_per_bit = order_per_bit;
 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the kvm on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.

 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.

 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.

 v2:
   Although this patchset looks very different with v1, the end result,
   that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.

 Patch 1-5 prepare some features to cover ppc kvm's requirements.
 Patch 6-7 generalize CMA reserved area management code and change users
 to use it.
 Patch 8-10 clean-up minor things.


I wanted to test the ppc changes and found that the patch series doesn't apply
against v3.15 . Do you have a kernel tree which I can clone to test this
series ?

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Now, we have general CMA reserved area management framework,
 so use it for future maintainabilty. There is no functional change.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Acked-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Need this. We may want to keep the VM_BUG_ON by moving
KVM_CMA_CHUNK_ORDER around.

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8056107..1932e0e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -37,8 +37,6 @@
 #include asm/ppc-opcode.h
 #include asm/cputable.h
 
-#include book3s_hv_cma.h
-
 /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
 #define MAX_LPID_970   63
 
@@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
kvm-arch.hpt_cma_alloc = 0;
-   VM_BUG_ON(order  KVM_CMA_CHUNK_ORDER);
page = kvm_alloc_hpt(1  (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));



-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Now, we have general CMA reserved area management framework,
 so use it for future maintainabilty. There is no functional change.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Acked-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

 diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
 b/arch/powerpc/kvm/book3s_hv_builtin.c
 index 7cde8a6..28ec226 100644
 --- a/arch/powerpc/kvm/book3s_hv_builtin.c
 +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
 @@ -16,12 +16,14 @@
  #include linux/init.h
  #include linux/memblock.h
  #include linux/sizes.h
 +#include linux/cma.h

  #include asm/cputable.h
  #include asm/kvm_ppc.h
  #include asm/kvm_book3s.h

 -#include book3s_hv_cma.h
 +#define KVM_CMA_CHUNK_ORDER  18
 +
  /*
   * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206)
   * should be power of 2.
 @@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5;
  unsigned long kvm_rma_pages = (1  27)  PAGE_SHIFT;   /* 128MB */
  EXPORT_SYMBOL_GPL(kvm_rma_pages);

 +static struct cma *kvm_cma;
 +
  /* Work out RMLS (real mode limit selector) field value for a given RMA size.
 Assumes POWER7 or PPC970. */
  static inline int lpcr_rmls(unsigned long rma_size)
 @@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
   ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
   if (!ri)
   return NULL;
 - page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages);
 + page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
   if (!page)
   goto err_out;
   atomic_set(ri-use_count, 1);
 @@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma);
  void kvm_release_rma(struct kvm_rma_info *ri)
  {
   if (atomic_dec_and_test(ri-use_count)) {
 - kvm_release_cma(pfn_to_page(ri-base_pfn), kvm_rma_pages);
 + cma_release(kvm_cma, pfn_to_page(ri-base_pfn), kvm_rma_pages);
   kfree(ri);
   }
  }
 @@ -134,13 +138,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
   /* Old CPUs require HPT aligned on a multiple of its size */
   if (!cpu_has_feature(CPU_FTR_ARCH_206))
   align_pages = nr_pages;
 - return kvm_alloc_cma(nr_pages, align_pages);
 + return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
  }
  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);

  void kvm_release_hpt(struct page *page, unsigned long nr_pages)
  {
 - kvm_release_cma(page, nr_pages);
 + cma_release(kvm_cma, page, nr_pages);
  }
  EXPORT_SYMBOL_GPL(kvm_release_hpt);

 @@ -179,7 +183,8 @@ void __init kvm_cma_reserve(void)
   align_size = HPT_ALIGN_PAGES  PAGE_SHIFT;

   align_size = max(kvm_rma_pages  PAGE_SHIFT, align_size);
 - kvm_cma_declare_contiguous(selected_size, align_size);
 + cma_declare_contiguous(selected_size, 0, 0, align_size,
 + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false);
   }
  }

 diff --git a/arch/powerpc/kvm/book3s_hv_cma.c 
 b/arch/powerpc/kvm/book3s_hv_cma.c
 deleted file mode 100644
 index d9d3d85..000
 --- a/arch/powerpc/kvm/book3s_hv_cma.c
 +++ /dev/null
 @@ -1,240 +0,0 @@
 -/*
 - * Contiguous Memory Allocator for ppc KVM hash pagetable  based on CMA
 - * for DMA mapping framework
 - *
 - * Copyright IBM Corporation, 2013
 - * Author Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 - *
 - * This program is free software; you can redistribute it and/or
 - * modify it under the terms of the GNU General Public License as
 - * published by the Free Software Foundation; either version 2 of the
 - * License or (at your optional) any later version of the license.
 - *
 - */
 -#define pr_fmt(fmt) kvm_cma:  fmt
 -
 -#ifdef CONFIG_CMA_DEBUG
 -#ifndef DEBUG
 -#  define DEBUG
 -#endif
 -#endif
 -
 -#include linux/memblock.h
 -#include linux/mutex.h
 -#include linux/sizes.h
 -#include linux/slab.h
 -
 -#include book3s_hv_cma.h
 -
 -struct kvm_cma {
 - unsigned long   base_pfn;
 - unsigned long   count;
 - unsigned long   *bitmap;
 -};
 -
 -static DEFINE_MUTEX(kvm_cma_mutex);
 -static struct kvm_cma kvm_cma_area;
 -
 -/**
 - * kvm_cma_declare_contiguous() - reserve area for contiguous memory handling
 - * for kvm hash pagetable
 - * @size:  Size of the reserved memory.
 - * @alignment:  Alignment for the contiguous memory area
 - *
 - * This function reserves memory for kvm cma area. It should be
 - * called by arch code when early allocator (memblock or bootmem)
 - * is still activate.
 - */
 -long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t 
 alignment)
 -{
 - long base_pfn;
 - phys_addr_t addr;
 - struct kvm_cma *cma = kvm_cma_area;
 -
 - pr_debug(%s(size %lx)\n, __func__, (unsigned long)size);
 -
 - if (!size)
 - return -EINVAL;
 - /*
 -  * Sanitise input arguments.
 -  * We should be pageblock aligned for CMA

Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the kvm on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.

 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.

 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.

 In previous patches, we have prepared some features to generalize
 CMA reserved area management and now it's time to do it. This patch
 moves core functions to mm/cma.c and change DMA APIs to use
 these functions.

 There is no functional change in DMA APIs.

 v2: There is no big change from v1 in mm/cma.c. Mostly renaming.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 00e13ce..4eac559 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -283,16 +283,6 @@ config CMA_ALIGNMENT

 If unsure, leave the default value 8.

 -config CMA_AREAS
 - int Maximum count of the CMA device-private areas
 - default 7
 - help
 -   CMA allows to create CMA areas for particular devices. This parameter
 -   sets the maximum number of such device private CMA areas in the
 -   system.
 -
 -   If unsure, leave the default value 7.
 -
  endif

  endmenu
 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index 9bc9340..f177f73 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -24,25 +24,10 @@

  #include linux/memblock.h
  #include linux/err.h
 -#include linux/mm.h
 -#include linux/mutex.h
 -#include linux/page-isolation.h
  #include linux/sizes.h
 -#include linux/slab.h
 -#include linux/swap.h
 -#include linux/mm_types.h
  #include linux/dma-contiguous.h
  #include linux/log2.h
 -
 -struct cma {
 - unsigned long   base_pfn;
 - unsigned long   count;
 - unsigned long   *bitmap;
 - int order_per_bit; /* Order of pages represented by one bit */
 - struct mutexlock;
 -};
 -
 -struct cma *dma_contiguous_default_area;
 +#include linux/cma.h

  #ifdef CONFIG_CMA_SIZE_MBYTES
  #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
 @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area;
  #define CMA_SIZE_MBYTES 0
  #endif

 +struct cma *dma_contiguous_default_area;
 +
  /*
   * Default global CMA area size can be defined in kernel's .config.
   * This is useful mainly for distro maintainers to create a kernel
 @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
   }
  }

 -static DEFINE_MUTEX(cma_mutex);
 -
 -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int 
 align_order)
 -{
 - return (1  (align_order  cma-order_per_bit)) - 1;
 -}
 -
 -static unsigned long cma_bitmap_maxno(struct cma *cma)
 -{
 - return cma-count  cma-order_per_bit;
 -}
 -
 -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
 - unsigned long pages)
 -{
 - return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
 -}
 -
 -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 -{
 - unsigned long bitmapno, nr_bits;
 -
 - bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
 - nr_bits = cma_bitmap_pages_to_bits(cma, count);
 -
 - mutex_lock(cma-lock);
 - bitmap_clear(cma-bitmap, bitmapno, nr_bits);
 - mutex_unlock(cma-lock);
 -}
 -
 -static int __init cma_activate_area(struct cma *cma)
 -{
 - int bitmap_maxno = cma_bitmap_maxno(cma);
 - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
 - unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
 - unsigned i = cma-count  pageblock_order;
 - struct zone *zone;
 -
 - pr_debug(%s()\n, __func__);
 -
 - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
 - if (!cma-bitmap)
 - return -ENOMEM;
 -
 - WARN_ON_ONCE(!pfn_valid(pfn));
 - zone = page_zone(pfn_to_page(pfn));
 -
 - do {
 - unsigned j;
 - base_pfn = pfn;
 - for (j = pageblock_nr_pages; j; --j, pfn++) {
 - WARN_ON_ONCE(!pfn_valid(pfn));
 - /*
 -  * alloc_contig_range requires the pfn range
 -  * specified to be in the same zone. Make this
 -  * simple by forcing

Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the kvm on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.

 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.

 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.

 In previous patches, we have prepared some features to generalize
 CMA reserved area management and now it's time to do it. This patch
 moves core functions to mm/cma.c and change DMA APIs to use
 these functions.

 There is no functional change in DMA APIs.

 v2: There is no big change from v1 in mm/cma.c. Mostly renaming.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 00e13ce..4eac559 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -283,16 +283,6 @@ config CMA_ALIGNMENT

 If unsure, leave the default value 8.

 -config CMA_AREAS
 - int Maximum count of the CMA device-private areas
 - default 7
 - help
 -   CMA allows to create CMA areas for particular devices. This parameter
 -   sets the maximum number of such device private CMA areas in the
 -   system.
 -
 -   If unsure, leave the default value 7.
 -
  endif

  endmenu
 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index 9bc9340..f177f73 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -24,25 +24,10 @@

  #include linux/memblock.h
  #include linux/err.h
 -#include linux/mm.h
 -#include linux/mutex.h
 -#include linux/page-isolation.h
  #include linux/sizes.h
 -#include linux/slab.h
 -#include linux/swap.h
 -#include linux/mm_types.h
  #include linux/dma-contiguous.h
  #include linux/log2.h
 -
 -struct cma {
 - unsigned long   base_pfn;
 - unsigned long   count;
 - unsigned long   *bitmap;
 - int order_per_bit; /* Order of pages represented by one bit */
 - struct mutexlock;
 -};
 -
 -struct cma *dma_contiguous_default_area;
 +#include linux/cma.h

  #ifdef CONFIG_CMA_SIZE_MBYTES
  #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
 @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area;
  #define CMA_SIZE_MBYTES 0
  #endif

 +struct cma *dma_contiguous_default_area;
 +
  /*
   * Default global CMA area size can be defined in kernel's .config.
   * This is useful mainly for distro maintainers to create a kernel
 @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
   }
  }

 -static DEFINE_MUTEX(cma_mutex);
 -
 -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int 
 align_order)
 -{
 - return (1  (align_order  cma-order_per_bit)) - 1;
 -}
 -
 -static unsigned long cma_bitmap_maxno(struct cma *cma)
 -{
 - return cma-count  cma-order_per_bit;
 -}
 -
 -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
 - unsigned long pages)
 -{
 - return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
 -}
 -
 -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 -{
 - unsigned long bitmapno, nr_bits;
 -
 - bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
 - nr_bits = cma_bitmap_pages_to_bits(cma, count);
 -
 - mutex_lock(cma-lock);
 - bitmap_clear(cma-bitmap, bitmapno, nr_bits);
 - mutex_unlock(cma-lock);
 -}
 -
 -static int __init cma_activate_area(struct cma *cma)
 -{
 - int bitmap_maxno = cma_bitmap_maxno(cma);
 - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
 - unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
 - unsigned i = cma-count  pageblock_order;
 - struct zone *zone;
 -
 - pr_debug(%s()\n, __func__);
 -
 - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
 - if (!cma-bitmap)
 - return -ENOMEM;
 -
 - WARN_ON_ONCE(!pfn_valid(pfn));
 - zone = page_zone(pfn_to_page(pfn));
 -
 - do {
 - unsigned j;
 - base_pfn = pfn;
 - for (j = pageblock_nr_pages; j; --j, pfn++) {
 - WARN_ON_ONCE(!pfn_valid(pfn));
 - /*
 -  * alloc_contig_range requires the pfn range
 -  * specified to be in the same zone. Make this
 -  * simple by forcing

Re: [PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 ppc kvm's cma region management requires arbitrary bitmap granularity,
 since they want to reserve very large memory and manage this region
 with bitmap that one bit for several pages to reduce management overheads.
 So support arbitrary bitmap granularity for following generalization.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index bc4c171..9bc9340 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -38,6 +38,7 @@ struct cma {
   unsigned long   base_pfn;
   unsigned long   count;
   unsigned long   *bitmap;
 + int order_per_bit; /* Order of pages represented by one bit */
   struct mutexlock;
  };

 @@ -157,9 +158,38 @@ void __init dma_contiguous_reserve(phys_addr_t limit)

  static DEFINE_MUTEX(cma_mutex);

 +static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int 
 align_order)
 +{
 + return (1  (align_order  cma-order_per_bit)) - 1;
 +}
 +
 +static unsigned long cma_bitmap_maxno(struct cma *cma)
 +{
 + return cma-count  cma-order_per_bit;
 +}
 +
 +static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
 + unsigned long pages)
 +{
 + return ALIGN(pages, 1  cma-order_per_bit)  cma-order_per_bit;
 +}
 +
 +static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 +{
 + unsigned long bitmapno, nr_bits;
 +
 + bitmapno = (pfn - cma-base_pfn)  cma-order_per_bit;
 + nr_bits = cma_bitmap_pages_to_bits(cma, count);
 +
 + mutex_lock(cma-lock);
 + bitmap_clear(cma-bitmap, bitmapno, nr_bits);
 + mutex_unlock(cma-lock);
 +}
 +
  static int __init cma_activate_area(struct cma *cma)
  {
 - int bitmap_size = BITS_TO_LONGS(cma-count) * sizeof(long);
 + int bitmap_maxno = cma_bitmap_maxno(cma);
 + int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
   unsigned long base_pfn = cma-base_pfn, pfn = base_pfn;
   unsigned i = cma-count  pageblock_order;
   struct zone *zone;
 @@ -221,6 +251,7 @@ core_initcall(cma_init_reserved_areas);
   * @base: Base address of the reserved area optional, use 0 for any
   * @limit: End address of the reserved memory (optional, 0 for any).
   * @alignment: Alignment for the contiguous memory area, should be power of 2
 + * @order_per_bit: Order of pages represented by one bit on bitmap.
   * @res_cma: Pointer to store the created cma region.
   * @fixed: hint about where to place the reserved area
   *
 @@ -235,7 +266,7 @@ core_initcall(cma_init_reserved_areas);
   */
  static int __init __dma_contiguous_reserve_area(phys_addr_t size,
   phys_addr_t base, phys_addr_t limit,
 - phys_addr_t alignment,
 + phys_addr_t alignment, int order_per_bit,
   struct cma **res_cma, bool fixed)
  {
   struct cma *cma = cma_areas[cma_area_count];
 @@ -269,6 +300,8 @@ static int __init 
 __dma_contiguous_reserve_area(phys_addr_t size,
   base = ALIGN(base, alignment);
   size = ALIGN(size, alignment);
   limit = ~(alignment - 1);
 + /* size should be aligned with order_per_bit */
 + BUG_ON(!IS_ALIGNED(size  PAGE_SHIFT, 1  order_per_bit));

   /* Reserve memory */
   if (base  fixed) {
 @@ -294,6 +327,7 @@ static int __init 
 __dma_contiguous_reserve_area(phys_addr_t size,
*/
   cma-base_pfn = PFN_DOWN(base);
   cma-count = size  PAGE_SHIFT;
 + cma-order_per_bit = order_per_bit;
   *res_cma = cma;
   cma_area_count++;

 @@ -313,7 +347,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,
  {
   int ret;

 - ret = __dma_contiguous_reserve_area(size, base, limit, 0,
 + ret = __dma_contiguous_reserve_area(size, base, limit, 0, 0,
   res_cma, fixed);
   if (ret)
   return ret;
 @@ -324,13 +358,6 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,
   return 0;
  }

 -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 -{
 - mutex_lock(cma-lock);
 - bitmap_clear(cma-bitmap, pfn - cma-base_pfn, count);
 - mutex_unlock(cma-lock);
 -}
 -
  /**
   * dma_alloc_from_contiguous() - allocate pages from contiguous area
   * @dev:   Pointer to device for which the allocation is performed.
 @@ -345,7 +372,8 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
 long pfn, int count)
  static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
  unsigned int align)
  {
 - unsigned long mask, pfn, pageno, start = 0;
 + unsigned long mask, pfn, start = 0;
 + unsigned long bitmap_maxno, bitmapno, nr_bits;
   struct page *page

Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-14 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 Currently, there are two users on CMA functionality, one is the DMA
 subsystem and the other is the kvm on powerpc. They have their own code
 to manage CMA reserved area even if they looks really similar.
 From my guess, it is caused by some needs on bitmap management. Kvm side
 wants to maintain bitmap not for 1 page, but for more size. Eventually it
 use bitmap where one bit represents 64 pages.

 When I implement CMA related patches, I should change those two places
 to apply my change and it seem to be painful to me. I want to change
 this situation and reduce future code management overhead through
 this patch.

 This change could also help developer who want to use CMA in their
 new feature development, since they can use CMA easily without
 copying  pasting this reserved area management code.

 In previous patches, we have prepared some features to generalize
 CMA reserved area management and now it's time to do it. This patch
 moves core functions to mm/cma.c and change DMA APIs to use
 these functions.

 There is no functional change in DMA APIs.

 v2: There is no big change from v1 in mm/cma.c. Mostly renaming.

 Acked-by: Michal Nazarewicz min...@mina86.com
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com


.

 +
 + mask = cma_bitmap_aligned_mask(cma, align);
 + bitmap_maxno = cma_bitmap_maxno(cma);
 + nr_bits = cma_bitmap_pages_to_bits(cma, count);
 +
 + for (;;) {
 + mutex_lock(cma-lock);
 + bitmapno = bitmap_find_next_zero_area(cma-bitmap,
 + bitmap_maxno, start, nr_bits, mask);
 + if (bitmapno = bitmap_maxno) {
 + mutex_unlock(cma-lock);
 + break;
 + }
 + bitmap_set(cma-bitmap, bitmapno, nr_bits);
 + /*
 +  * It's safe to drop the lock here. We've marked this region for
 +  * our exclusive use. If the migration fails we will take the
 +  * lock again and unmark it.
 +  */
 + mutex_unlock(cma-lock);
 +
 + pfn = cma-base_pfn + (bitmapno  cma-order_per_bit);
 + mutex_lock(cma_mutex);
 + ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
 + mutex_unlock(cma_mutex);
 + if (ret == 0) {
 + page = pfn_to_page(pfn);
 + break;
 + } else if (ret != -EBUSY) {
 + clear_cma_bitmap(cma, pfn, count);
 + break;
 + }
 + 


For setting bit map we do
bitmap_set(cma-bitmap, bitmapno, nr_bits);
alloc_contig()..
if (error)
clear_cma_bitmap(cma, pfn, count);

Why ?

why not bitmap_clear() ?

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
With guest supporting Multiple page size per segment (MPSS),
hpte_page_size returns actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 19 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  2 +-
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 34422be566ce..3d0f3fb9c6b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
 }
 
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+bool is_base_size)
 {
+
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l  LP_SHIFT)  ((1  LP_BITS) - 1);
@@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
continue;
 
a_psize = __hpte_actual_psize(lp, size);
-   if (a_psize != -1)
+   if (a_psize != -1) {
+   if (is_base_size)
+   return 1ul  
mmu_psize_defs[size].shift;
return 1ul  mmu_psize_defs[a_psize].shift;
+   }
}
 
}
return 0;
 }
 
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+   return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long 
l)
+{
+   return __hpte_page_size(h, l, 1);
+}
+
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
return ((ptel  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f53cf2eae36a..7ff45ed27c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
goto out;
}
if (!rma_setup  is_vrma_hpte(v)) {
-   unsigned long psize = hpte_page_size(v, r);
+   unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 87624ab5ba82..c6aca75b8376 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, 
unsigned long slb_v,
 * to check against the actual page size.
 */
if ((v  valid)  (v  mask) == val 
-   hpte_page_size(v, r) == (1ul  pshift))
+   hpte_base_page_size(v, r) == (1ul  pshift))
/* Return with the HPTE still locked */
return (hash  3) + (i  1);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 13.06.14 09:23, Aneesh Kumar K.V wrote:
 With guest supporting Multiple page size per segment (MPSS),
 hpte_page_size returns actual page size used. Add a new function to
 return base page size and use that to compare against the the page size
 calculated from SLB

 Why? What does this fix? Is this a bug fix, an enhancement? Don't 
 describe only what you do, but also why you do it.



This could result in page fault failures (unhandled page fault) because
even though we have a valid hpte entry mapping a 16MB page, since we
were comparing actual page size against page size calculated from SLB
bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
a failure in real and the bug was found during code audit. That could be
because with THP we have guest ram backed by hugetlbfs and we always
find the page in the host linux page table. The will result in do_h_enter always
inserting HPTE_V_VALID entry and hence we might not really end up calling
kvmppc_hv_find_lock_hpte.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 13.06.14 16:28, Aneesh Kumar K.V wrote:
 Alexander Graf ag...@suse.de writes:

 On 13.06.14 09:23, Aneesh Kumar K.V wrote:
 With guest supporting Multiple page size per segment (MPSS),
 hpte_page_size returns actual page size used. Add a new function to
 return base page size and use that to compare against the the page size
 calculated from SLB
 Why? What does this fix? Is this a bug fix, an enhancement? Don't
 describe only what you do, but also why you do it.


 This could result in page fault failures (unhandled page fault) because
 even though we have a valid hpte entry mapping a 16MB page, since we
 were comparing actual page size against page size calculated from SLB
 bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
 a failure in real and the bug was found during code audit. That could be
 because with THP we have guest ram backed by hugetlbfs and we always
 find the page in the host linux page table. The will result in do_h_enter 
 always
 inserting HPTE_V_VALID entry and hence we might not really end up calling
 kvmppc_hv_find_lock_hpte.

 So why do we need to override to base page size for the VRMA region?

slb encoding should be derived based on base page size. 

 Also I think you want to change the comment above the line in 
 find_lock_hpte you're changing.


Will do that.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
With guest supporting Multiple page size per segment (MPSS),
hpte_page_size returns actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 19 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  2 +-
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 34422be566ce..3d0f3fb9c6b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
return rb;
 }
 
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+bool is_base_size)
 {
+
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l  LP_SHIFT)  ((1  LP_BITS) - 1);
@@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long 
h, unsigned long l)
continue;
 
a_psize = __hpte_actual_psize(lp, size);
-   if (a_psize != -1)
+   if (a_psize != -1) {
+   if (is_base_size)
+   return 1ul  
mmu_psize_defs[size].shift;
return 1ul  mmu_psize_defs[a_psize].shift;
+   }
}
 
}
return 0;
 }
 
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+   return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long 
l)
+{
+   return __hpte_page_size(h, l, 1);
+}
+
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
return ((ptel  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f53cf2eae36a..7ff45ed27c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
goto out;
}
if (!rma_setup  is_vrma_hpte(v)) {
-   unsigned long psize = hpte_page_size(v, r);
+   unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 87624ab5ba82..c6aca75b8376 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, 
unsigned long slb_v,
 * to check against the actual page size.
 */
if ((v  valid)  (v  mask) == val 
-   hpte_page_size(v, r) == (1ul  pshift))
+   hpte_base_page_size(v, r) == (1ul  pshift))
/* Return with the HPTE still locked */
return (hash  3) + (i  1);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 13.06.14 09:23, Aneesh Kumar K.V wrote:
 With guest supporting Multiple page size per segment (MPSS),
 hpte_page_size returns actual page size used. Add a new function to
 return base page size and use that to compare against the the page size
 calculated from SLB

 Why? What does this fix? Is this a bug fix, an enhancement? Don't 
 describe only what you do, but also why you do it.



This could result in page fault failures (unhandled page fault) because
even though we have a valid hpte entry mapping a 16MB page, since we
were comparing actual page size against page size calculated from SLB
bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
a failure in real and the bug was found during code audit. That could be
because with THP we have guest ram backed by hugetlbfs and we always
find the page in the host linux page table. The will result in do_h_enter always
inserting HPTE_V_VALID entry and hence we might not really end up calling
kvmppc_hv_find_lock_hpte.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

2014-06-13 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 13.06.14 16:28, Aneesh Kumar K.V wrote:
 Alexander Graf ag...@suse.de writes:

 On 13.06.14 09:23, Aneesh Kumar K.V wrote:
 With guest supporting Multiple page size per segment (MPSS),
 hpte_page_size returns actual page size used. Add a new function to
 return base page size and use that to compare against the the page size
 calculated from SLB
 Why? What does this fix? Is this a bug fix, an enhancement? Don't
 describe only what you do, but also why you do it.


 This could result in page fault failures (unhandled page fault) because
 even though we have a valid hpte entry mapping a 16MB page, since we
 were comparing actual page size against page size calculated from SLB
 bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe
 a failure in real and the bug was found during code audit. That could be
 because with THP we have guest ram backed by hugetlbfs and we always
 find the page in the host linux page table. The will result in do_h_enter 
 always
 inserting HPTE_V_VALID entry and hence we might not really end up calling
 kvmppc_hv_find_lock_hpte.

 So why do we need to override to base page size for the VRMA region?

slb encoding should be derived based on base page size. 

 Also I think you want to change the comment above the line in 
 find_lock_hpte you're changing.


Will do that.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 We don't need explicit 'CMA:' prefix, since we already define prefix
 'cma:' in pr_fmt. So remove it.

 And, some logs print function name and others doesn't. This looks
 bad to me, so I unify log format to print function name consistently.

 Lastly, I add one more debug log on cma_activate_area().

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index 83969f8..bd0bb81 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
   }

   if (selected_size  !dma_contiguous_default_area) {
 - pr_debug(%s: reserving %ld MiB for global area\n, __func__,
 + pr_debug(%s(): reserving %ld MiB for global area\n, __func__,
(unsigned long)selected_size / SZ_1M);

Do we need to do function(), or just function:. I have seen the later
usage in other parts of the kernel.


   dma_contiguous_reserve_area(selected_size, selected_base,
 @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma)
   unsigned i = cma-count  pageblock_order;
   struct zone *zone;

 - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
 + pr_debug(%s()\n, __func__);

why ?


 + cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
   if (!cma-bitmap)
   return -ENOMEM;

 @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,

   /* Sanity checks */
   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
 - pr_err(Not enough slots for CMA reserved regions!\n);
 + pr_err(%s(): Not enough slots for CMA reserved regions!\n,
 + __func__);
   return -ENOSPC;
   }

 @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
 size, phys_addr_t base,
   *res_cma = cma;
   cma_area_count++;

 - pr_info(CMA: reserved %ld MiB at %08lx\n, (unsigned long)size / SZ_1M,
 - (unsigned long)base);
 + pr_info(%s(): reserved %ld MiB at %08lx\n,
 + __func__, (unsigned long)size / SZ_1M, (unsigned long)base);

   /* Architecture specific contiguous memory fixup. */
   dma_contiguous_early_fixup(base, size);
   return 0;
  err:
 - pr_err(CMA: failed to reserve %ld MiB\n, (unsigned long)size / SZ_1M);
 + pr_err(%s(): failed to reserve %ld MiB\n,
 + __func__, (unsigned long)size / SZ_1M);
   return ret;
  }

 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak

2014-06-11 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 We should free memory for bitmap when we find zone mis-match,
 otherwise this memory will leak.

 Additionally, I copy code comment from ppc kvm's cma code to notify
 why we need to check zone mis-match.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index bd0bb81..fb0cdce 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma)
   base_pfn = pfn;
   for (j = pageblock_nr_pages; j; --j, pfn++) {
   WARN_ON_ONCE(!pfn_valid(pfn));
 + /*
 +  * alloc_contig_range requires the pfn range
 +  * specified to be in the same zone. Make this
 +  * simple by forcing the entire CMA resv range
 +  * to be in the same zone.
 +  */
   if (page_zone(pfn_to_page(pfn)) != zone)
 - return -EINVAL;
 + goto err;
   }
   init_cma_reserved_pageblock(pfn_to_page(base_pfn));
   } while (--i);

   mutex_init(cma-lock);
   return 0;
 +
 +err:
 + kfree(cma-bitmap);
 + return -EINVAL;
  }

  static struct cma cma_areas[MAX_CMA_AREAS];
 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs

2014-06-11 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 To prepare future generalization work on cma area management code,
 we need to separate core cma management codes from DMA APIs.
 We will extend these core functions to cover requirements of
 ppc kvm's cma area management functionality in following patches.
 This separation helps us not to touch DMA APIs while extending
 core functions.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index fb0cdce..8a44c82 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas);
   * If @fixed is true, reserve contiguous area at exactly @base.  If false,
   * reserve in range from @base to @limit.
   */
 -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
 -phys_addr_t limit, struct cma **res_cma,
 -bool fixed)
 +static int __init __dma_contiguous_reserve_area(phys_addr_t size,
 + phys_addr_t base, phys_addr_t limit,
 + struct cma **res_cma, bool fixed)
  {
   struct cma *cma = cma_areas[cma_area_count];
   phys_addr_t alignment;
 @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
 size, phys_addr_t base,

   pr_info(%s(): reserved %ld MiB at %08lx\n,
   __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
 -
 - /* Architecture specific contiguous memory fixup. */
 - dma_contiguous_early_fixup(base, size);
   return 0;
 +
  err:
   pr_err(%s(): failed to reserve %ld MiB\n,
   __func__, (unsigned long)size / SZ_1M);
   return ret;
  }

 +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
 +phys_addr_t limit, struct cma **res_cma,
 +bool fixed)
 +{
 + int ret;
 +
 + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
 + if (ret)
 + return ret;
 +
 + /* Architecture specific contiguous memory fixup. */
 + dma_contiguous_early_fixup(base, size);
 +
 + return 0;
 +}
 +
  static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
  {
   mutex_lock(cma-lock);
 @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
 long pfn, int count)
   * global one. Requires architecture specific dev_get_cma_area() helper
   * function.
   */
 -struct page *dma_alloc_from_contiguous(struct device *dev, int count,
 +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
  unsigned int align)
  {
   unsigned long mask, pfn, pageno, start = 0;
 - struct cma *cma = dev_get_cma_area(dev);
   struct page *page = NULL;
   int ret;

   if (!cma || !cma-count)
   return NULL;

 - if (align  CONFIG_CMA_ALIGNMENT)
 - align = CONFIG_CMA_ALIGNMENT;
 -
   pr_debug(%s(cma %p, count %d, align %d)\n, __func__, (void *)cma,
count, align);

 @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device 
 *dev, int count,
   return page;
  }

 +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
 +unsigned int align)
 +{
 + struct cma *cma = dev_get_cma_area(dev);
 +
 + if (align  CONFIG_CMA_ALIGNMENT)
 + align = CONFIG_CMA_ALIGNMENT;
 +
 + return __dma_alloc_from_contiguous(cma, count, align);
 +}
 +
  /**
   * dma_release_from_contiguous() - release allocated pages
   * @dev:   Pointer to device for which the pages were allocated.
 @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device 
 *dev, int count,
   * It returns false when provided pages do not belong to contiguous area and
   * true otherwise.
   */
 -bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 +static bool __dma_release_from_contiguous(struct cma *cma, struct page 
 *pages,
int count)
  {
 - struct cma *cma = dev_get_cma_area(dev);
   unsigned long pfn;

   if (!cma || !pages)
 @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, 
 struct page *pages,

   return true;
  }
 +
 +bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 +  int count)
 +{
 + struct cma *cma = dev_get_cma_area(dev);
 +
 + return __dma_release_from_contiguous(cma, pages, count);
 +}
 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region

2014-06-11 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 ppc kvm's cma area management needs alignment constraint on
 cma region. So support it to prepare generalization of cma area
 management functionality.

 Additionally, add some comments which tell us why alignment
 constraint is needed on cma region.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index 8a44c82..bc4c171 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -32,6 +32,7 @@
  #include linux/swap.h
  #include linux/mm_types.h
  #include linux/dma-contiguous.h
 +#include linux/log2.h

  struct cma {
   unsigned long   base_pfn;
 @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas);
   * @size: Size of the reserved area (in bytes),
   * @base: Base address of the reserved area optional, use 0 for any
   * @limit: End address of the reserved memory (optional, 0 for any).
 + * @alignment: Alignment for the contiguous memory area, should be power of 2
   * @res_cma: Pointer to store the created cma region.
   * @fixed: hint about where to place the reserved area
   *
 @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas);
   */
  static int __init __dma_contiguous_reserve_area(phys_addr_t size,
   phys_addr_t base, phys_addr_t limit,
 + phys_addr_t alignment,
   struct cma **res_cma, bool fixed)
  {
   struct cma *cma = cma_areas[cma_area_count];
 - phys_addr_t alignment;
   int ret = 0;

 - pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__,
 -  (unsigned long)size, (unsigned long)base,
 -  (unsigned long)limit);
 + pr_debug(%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n,
 + __func__, (unsigned long)size, (unsigned long)base,
 + (unsigned long)limit, (unsigned long)alignment);

   /* Sanity checks */
   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
 @@ -253,8 +255,17 @@ static int __init 
 __dma_contiguous_reserve_area(phys_addr_t size,
   if (!size)
   return -EINVAL;

 - /* Sanitise input arguments */
 - alignment = PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order);
 + if (alignment  !is_power_of_2(alignment))
 + return -EINVAL;
 +
 + /*
 +  * Sanitise input arguments.
 +  * CMA area should be at least MAX_ORDER - 1 aligned. Otherwise,
 +  * CMA area could be merged into other MIGRATE_TYPE by buddy mechanism
 +  * and CMA property will be broken.
 +  */
 + alignment = max(alignment,
 + (phys_addr_t)PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order));
   base = ALIGN(base, alignment);
   size = ALIGN(size, alignment);
   limit = ~(alignment - 1);
 @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,
  {
   int ret;

 - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
 + ret = __dma_contiguous_reserve_area(size, base, limit, 0,
 + res_cma, fixed);
   if (ret)
   return ret;

 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 We don't need explicit 'CMA:' prefix, since we already define prefix
 'cma:' in pr_fmt. So remove it.

 And, some logs print function name and others doesn't. This looks
 bad to me, so I unify log format to print function name consistently.

 Lastly, I add one more debug log on cma_activate_area().

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index 83969f8..bd0bb81 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
   }

   if (selected_size  !dma_contiguous_default_area) {
 - pr_debug(%s: reserving %ld MiB for global area\n, __func__,
 + pr_debug(%s(): reserving %ld MiB for global area\n, __func__,
(unsigned long)selected_size / SZ_1M);

Do we need to do function(), or just function:. I have seen the later
usage in other parts of the kernel.


   dma_contiguous_reserve_area(selected_size, selected_base,
 @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma)
   unsigned i = cma-count  pageblock_order;
   struct zone *zone;

 - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
 + pr_debug(%s()\n, __func__);

why ?


 + cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL);
   if (!cma-bitmap)
   return -ENOMEM;

 @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,

   /* Sanity checks */
   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
 - pr_err(Not enough slots for CMA reserved regions!\n);
 + pr_err(%s(): Not enough slots for CMA reserved regions!\n,
 + __func__);
   return -ENOSPC;
   }

 @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
 size, phys_addr_t base,
   *res_cma = cma;
   cma_area_count++;

 - pr_info(CMA: reserved %ld MiB at %08lx\n, (unsigned long)size / SZ_1M,
 - (unsigned long)base);
 + pr_info(%s(): reserved %ld MiB at %08lx\n,
 + __func__, (unsigned long)size / SZ_1M, (unsigned long)base);

   /* Architecture specific contiguous memory fixup. */
   dma_contiguous_early_fixup(base, size);
   return 0;
  err:
 - pr_err(CMA: failed to reserve %ld MiB\n, (unsigned long)size / SZ_1M);
 + pr_err(%s(): failed to reserve %ld MiB\n,
 + __func__, (unsigned long)size / SZ_1M);
   return ret;
  }

 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak

2014-06-11 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 We should free memory for bitmap when we find zone mis-match,
 otherwise this memory will leak.

 Additionally, I copy code comment from ppc kvm's cma code to notify
 why we need to check zone mis-match.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index bd0bb81..fb0cdce 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma)
   base_pfn = pfn;
   for (j = pageblock_nr_pages; j; --j, pfn++) {
   WARN_ON_ONCE(!pfn_valid(pfn));
 + /*
 +  * alloc_contig_range requires the pfn range
 +  * specified to be in the same zone. Make this
 +  * simple by forcing the entire CMA resv range
 +  * to be in the same zone.
 +  */
   if (page_zone(pfn_to_page(pfn)) != zone)
 - return -EINVAL;
 + goto err;
   }
   init_cma_reserved_pageblock(pfn_to_page(base_pfn));
   } while (--i);

   mutex_init(cma-lock);
   return 0;
 +
 +err:
 + kfree(cma-bitmap);
 + return -EINVAL;
  }

  static struct cma cma_areas[MAX_CMA_AREAS];
 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs

2014-06-11 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 To prepare future generalization work on cma area management code,
 we need to separate core cma management codes from DMA APIs.
 We will extend these core functions to cover requirements of
 ppc kvm's cma area management functionality in following patches.
 This separation helps us not to touch DMA APIs while extending
 core functions.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index fb0cdce..8a44c82 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas);
   * If @fixed is true, reserve contiguous area at exactly @base.  If false,
   * reserve in range from @base to @limit.
   */
 -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
 -phys_addr_t limit, struct cma **res_cma,
 -bool fixed)
 +static int __init __dma_contiguous_reserve_area(phys_addr_t size,
 + phys_addr_t base, phys_addr_t limit,
 + struct cma **res_cma, bool fixed)
  {
   struct cma *cma = cma_areas[cma_area_count];
   phys_addr_t alignment;
 @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
 size, phys_addr_t base,

   pr_info(%s(): reserved %ld MiB at %08lx\n,
   __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
 -
 - /* Architecture specific contiguous memory fixup. */
 - dma_contiguous_early_fixup(base, size);
   return 0;
 +
  err:
   pr_err(%s(): failed to reserve %ld MiB\n,
   __func__, (unsigned long)size / SZ_1M);
   return ret;
  }

 +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
 +phys_addr_t limit, struct cma **res_cma,
 +bool fixed)
 +{
 + int ret;
 +
 + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
 + if (ret)
 + return ret;
 +
 + /* Architecture specific contiguous memory fixup. */
 + dma_contiguous_early_fixup(base, size);
 +
 + return 0;
 +}
 +
  static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
  {
   mutex_lock(cma-lock);
 @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
 long pfn, int count)
   * global one. Requires architecture specific dev_get_cma_area() helper
   * function.
   */
 -struct page *dma_alloc_from_contiguous(struct device *dev, int count,
 +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
  unsigned int align)
  {
   unsigned long mask, pfn, pageno, start = 0;
 - struct cma *cma = dev_get_cma_area(dev);
   struct page *page = NULL;
   int ret;

   if (!cma || !cma-count)
   return NULL;

 - if (align  CONFIG_CMA_ALIGNMENT)
 - align = CONFIG_CMA_ALIGNMENT;
 -
   pr_debug(%s(cma %p, count %d, align %d)\n, __func__, (void *)cma,
count, align);

 @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device 
 *dev, int count,
   return page;
  }

 +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
 +unsigned int align)
 +{
 + struct cma *cma = dev_get_cma_area(dev);
 +
 + if (align  CONFIG_CMA_ALIGNMENT)
 + align = CONFIG_CMA_ALIGNMENT;
 +
 + return __dma_alloc_from_contiguous(cma, count, align);
 +}
 +
  /**
   * dma_release_from_contiguous() - release allocated pages
   * @dev:   Pointer to device for which the pages were allocated.
 @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device 
 *dev, int count,
   * It returns false when provided pages do not belong to contiguous area and
   * true otherwise.
   */
 -bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 +static bool __dma_release_from_contiguous(struct cma *cma, struct page 
 *pages,
int count)
  {
 - struct cma *cma = dev_get_cma_area(dev);
   unsigned long pfn;

   if (!cma || !pages)
 @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, 
 struct page *pages,

   return true;
  }
 +
 +bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 +  int count)
 +{
 + struct cma *cma = dev_get_cma_area(dev);
 +
 + return __dma_release_from_contiguous(cma, pages, count);
 +}
 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region

2014-06-11 Thread Aneesh Kumar K.V
Joonsoo Kim iamjoonsoo@lge.com writes:

 ppc kvm's cma area management needs alignment constraint on
 cma region. So support it to prepare generalization of cma area
 management functionality.

 Additionally, add some comments which tell us why alignment
 constraint is needed on cma region.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com


 diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
 index 8a44c82..bc4c171 100644
 --- a/drivers/base/dma-contiguous.c
 +++ b/drivers/base/dma-contiguous.c
 @@ -32,6 +32,7 @@
  #include linux/swap.h
  #include linux/mm_types.h
  #include linux/dma-contiguous.h
 +#include linux/log2.h

  struct cma {
   unsigned long   base_pfn;
 @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas);
   * @size: Size of the reserved area (in bytes),
   * @base: Base address of the reserved area optional, use 0 for any
   * @limit: End address of the reserved memory (optional, 0 for any).
 + * @alignment: Alignment for the contiguous memory area, should be power of 2
   * @res_cma: Pointer to store the created cma region.
   * @fixed: hint about where to place the reserved area
   *
 @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas);
   */
  static int __init __dma_contiguous_reserve_area(phys_addr_t size,
   phys_addr_t base, phys_addr_t limit,
 + phys_addr_t alignment,
   struct cma **res_cma, bool fixed)
  {
   struct cma *cma = cma_areas[cma_area_count];
 - phys_addr_t alignment;
   int ret = 0;

 - pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__,
 -  (unsigned long)size, (unsigned long)base,
 -  (unsigned long)limit);
 + pr_debug(%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n,
 + __func__, (unsigned long)size, (unsigned long)base,
 + (unsigned long)limit, (unsigned long)alignment);

   /* Sanity checks */
   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
 @@ -253,8 +255,17 @@ static int __init 
 __dma_contiguous_reserve_area(phys_addr_t size,
   if (!size)
   return -EINVAL;

 - /* Sanitise input arguments */
 - alignment = PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order);
 + if (alignment  !is_power_of_2(alignment))
 + return -EINVAL;
 +
 + /*
 +  * Sanitise input arguments.
 +  * CMA area should be at least MAX_ORDER - 1 aligned. Otherwise,
 +  * CMA area could be merged into other MIGRATE_TYPE by buddy mechanism
 +  * and CMA property will be broken.
 +  */
 + alignment = max(alignment,
 + (phys_addr_t)PAGE_SIZE  max(MAX_ORDER - 1, pageblock_order));
   base = ALIGN(base, alignment);
   size = ALIGN(size, alignment);
   limit = ~(alignment - 1);
 @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
 phys_addr_t base,
  {
   int ret;

 - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
 + ret = __dma_contiguous_reserve_area(size, base, limit, 0,
 + res_cma, fixed);
   if (ret)
   return ret;

 -- 
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: PPC: BOOK3S: PR: Doorbell support

2014-06-06 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 05.06.14 14:08, Aneesh Kumar K.V wrote:
 We don't have SMT support yet, hence we should not find a doorbell
 message generated

 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
   arch/powerpc/kvm/book3s_emulate.c | 18 ++
   1 file changed, 18 insertions(+)

 diff --git a/arch/powerpc/kvm/book3s_emulate.c 
 b/arch/powerpc/kvm/book3s_emulate.c
 index 1bb16a59dcbc..d6c87d085182 100644
 --- a/arch/powerpc/kvm/book3s_emulate.c
 +++ b/arch/powerpc/kvm/book3s_emulate.c
 @@ -28,7 +28,9 @@
   #define OP_19_XOP_RFI  50
   
   #define OP_31_XOP_MFMSR83
 +#define OP_31_XOP_MSGSNDP   142
   #define OP_31_XOP_MTMSR146
 +#define OP_31_XOP_MSGCLRP   174
   #define OP_31_XOP_MTMSRD   178
   #define OP_31_XOP_MTSR 210
   #define OP_31_XOP_MTSRIN   242
 @@ -303,6 +305,22 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
   
  break;
  }
 +case OP_31_XOP_MSGSNDP:
 +{
 +/*
 + * PR KVM still don't support SMT mode. So we should

 still?

 + * not see a MSGSNDP/MSGCLRP used with PR KVM
 + */
 +pr_info(KVM: MSGSNDP used in non SMT case\n);
 +emulated = EMULATE_FAIL;

 What would happen on an HV guest with only 1 thread that MSGSNDs to 
 thread 0? Would the guest get an illegal instruction trap, a 
 self-interrupt or would this be a simple nop?


We do get a self-interrupt. I tried the below

tag = mfspr(SPRN_TIR)  0x7f;
ppc_msgsnd(5, 0, tag);

And that results in doorbell exception. That implies we will have to
have full implementation of doorbell. You can drop patch 2 and 3 from
this series. I will rework them.

NOTE: This is not an issue for Linux guest, because we don't send ipi
to self. But to complete the emulation of msgsndp we will need to
emulate it properly.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: PPC: BOOK3S: PR: Doorbell support

2014-06-06 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 05.06.14 14:08, Aneesh Kumar K.V wrote:
 We don't have SMT support yet, hence we should not find a doorbell
 message generated

 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
   arch/powerpc/kvm/book3s_emulate.c | 18 ++
   1 file changed, 18 insertions(+)

 diff --git a/arch/powerpc/kvm/book3s_emulate.c 
 b/arch/powerpc/kvm/book3s_emulate.c
 index 1bb16a59dcbc..d6c87d085182 100644
 --- a/arch/powerpc/kvm/book3s_emulate.c
 +++ b/arch/powerpc/kvm/book3s_emulate.c
 @@ -28,7 +28,9 @@
   #define OP_19_XOP_RFI  50
   
   #define OP_31_XOP_MFMSR83
 +#define OP_31_XOP_MSGSNDP   142
   #define OP_31_XOP_MTMSR146
 +#define OP_31_XOP_MSGCLRP   174
   #define OP_31_XOP_MTMSRD   178
   #define OP_31_XOP_MTSR 210
   #define OP_31_XOP_MTSRIN   242
 @@ -303,6 +305,22 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
   
  break;
  }
 +case OP_31_XOP_MSGSNDP:
 +{
 +/*
 + * PR KVM still don't support SMT mode. So we should

 still?

 + * not see a MSGSNDP/MSGCLRP used with PR KVM
 + */
 +pr_info(KVM: MSGSNDP used in non SMT case\n);
 +emulated = EMULATE_FAIL;

 What would happen on an HV guest with only 1 thread that MSGSNDs to 
 thread 0? Would the guest get an illegal instruction trap, a 
 self-interrupt or would this be a simple nop?


We do get a self-interrupt. I tried the below

tag = mfspr(SPRN_TIR)  0x7f;
ppc_msgsnd(5, 0, tag);

And that results in doorbell exception. That implies we will have to
have full implementation of doorbell. You can drop patch 2 and 3 from
this series. I will rework them.

NOTE: This is not an issue for Linux guest, because we don't send ipi
to self. But to complete the emulation of msgsndp we will need to
emulate it properly.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >