Re: [PATCH 1/3] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
Paul Mackerras pau...@samba.org writes: The reference (R) and change (C) bits in a HPT entry can be set by hardware at any time up until the HPTE is invalidated and the TLB invalidation sequence has completed. This means that when removing a HPTE, we need to read the HPTE after the invalidation sequence has completed in order to obtain reliable values of R and C. The code in kvmppc_do_h_remove() used to do this. However, commit 6f22bd3265fb (KVM: PPC: Book3S HV: Make HTAB code LE host aware) removed the read after invalidation as a side effect of other changes. This restores the read of the HPTE after invalidation. The user-visible effect of this bug would be that when migrating a guest, there is a small probability that a page modified by the guest and then unmapped by the guest might not get re-transmitted and thus the destination might end up with a stale copy of the page. Fixes: 6f22bd3265fb Cc: sta...@vger.kernel.org # v3.17+ Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index f6bf0b1..5c1737f 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -413,14 +413,12 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); v = pte ~HPTE_V_HVLOCK; if (v HPTE_V_VALID) { - u64 pte1; - - pte1 = be64_to_cpu(hpte[1]); hpte[0] = ~cpu_to_be64(HPTE_V_VALID); - rb = compute_tlbie_rb(v, pte1, pte_index); + rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index); do_tlbies(kvm, rb, 1, global_invalidates(kvm, flags), true); /* Read PTE low word after tlbie to get final R/C values */ - remove_revmap_chain(kvm, pte_index, rev, v, pte1); + remove_revmap_chain(kvm, pte_index, rev, v, + be64_to_cpu(hpte[1])); } May be add the above commit message as a code comment ? r = rev-guest_rpte ~HPTE_GR_RESERVED; note_hpte_modification(kvm, rev); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
pte can get updated from other CPUs as part of multiple activities like THP split, huge page collapse, unmap. We need to make sure we don't reload the pte value again and again for different checks. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Note: This is posted previously as part of http://article.gmane.org/gmane.linux.ports.ppc.embedded/79278 arch/powerpc/include/asm/kvm_book3s_64.h | 5 - arch/powerpc/kvm/e500_mmu_host.c | 20 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index cc073a7ac2b7..f06820c67175 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -290,7 +290,10 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, pte_t old_pte, new_pte = __pte(0); while (1) { - old_pte = *ptep; + /* +* Make sure we don't reload from ptep +*/ + old_pte = READ_ONCE(*ptep); /* * wait until _PAGE_BUSY is clear then set it atomically */ diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index cc536d4a75ef..5840d546aa03 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -469,14 +469,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pgdir = vcpu_e500-vcpu.arch.pgdir; ptep = lookup_linux_ptep(pgdir, hva, tsize_pages); - if (pte_present(*ptep)) - wimg = (*ptep PTE_WIMGE_SHIFT) MAS2_WIMGE_MASK; - else { - if (printk_ratelimit()) - pr_err(%s: pte not present: gfn %lx, pfn %lx\n, - __func__, (long)gfn, pfn); - ret = -EINVAL; - goto out; + if (ptep) { + pte_t pte = READ_ONCE(*ptep); + + if (pte_present(pte)) + wimg = (pte_val(pte) PTE_WIMGE_SHIFT) + MAS2_WIMGE_MASK; + else { + pr_err_ratelimited(%s: pte not present: gfn %lx,pfn %lx\n, + __func__, (long)gfn, pfn); + ret = -EINVAL; + goto out; + } } kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
pte can get updated from other CPUs as part of multiple activities like THP split, huge page collapse, unmap. We need to make sure we don't reload the pte value again and again for different checks. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Note: This is posted previously as part of http://article.gmane.org/gmane.linux.ports.ppc.embedded/79278 arch/powerpc/include/asm/kvm_book3s_64.h | 5 - arch/powerpc/kvm/e500_mmu_host.c | 20 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index cc073a7ac2b7..f06820c67175 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -290,7 +290,10 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, pte_t old_pte, new_pte = __pte(0); while (1) { - old_pte = *ptep; + /* +* Make sure we don't reload from ptep +*/ + old_pte = READ_ONCE(*ptep); /* * wait until _PAGE_BUSY is clear then set it atomically */ diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index cc536d4a75ef..5840d546aa03 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -469,14 +469,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pgdir = vcpu_e500-vcpu.arch.pgdir; ptep = lookup_linux_ptep(pgdir, hva, tsize_pages); - if (pte_present(*ptep)) - wimg = (*ptep PTE_WIMGE_SHIFT) MAS2_WIMGE_MASK; - else { - if (printk_ratelimit()) - pr_err(%s: pte not present: gfn %lx, pfn %lx\n, - __func__, (long)gfn, pfn); - ret = -EINVAL; - goto out; + if (ptep) { + pte_t pte = READ_ONCE(*ptep); + + if (pte_present(pte)) + wimg = (pte_val(pte) PTE_WIMGE_SHIFT) + MAS2_WIMGE_MASK; + else { + pr_err_ratelimited(%s: pte not present: gfn %lx,pfn %lx\n, + __func__, (long)gfn, pfn); + ret = -EINVAL; + goto out; + } } kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Remove page table walk helpers
This patch remove helpers which we had used only once in the code. Limiting page table walk variants help in ensuring that we won't end up with code walking page table with wrong assumptions. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pgtable.h | 21 - arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 - arch/powerpc/kvm/e500_mmu_host.c| 2 +- 3 files changed, 28 insertions(+), 57 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 9835ac4173b7..92fe01c355a9 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -249,27 +249,6 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); - -static inline pte_t *lookup_linux_ptep(pgd_t *pgdir, unsigned long hva, -unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); - if (!ptep) - return NULL; - if (shift) - *pte_sizep = 1ul shift; - else - *pte_sizep = PAGE_SIZE; - - if (ps *pte_sizep) - return NULL; - - return ptep; -} #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 625407e4d3b0..73e083cb9f7e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -131,25 +131,6 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index, unlock_rmap(rmap); } -static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva, - int writing, unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int hugepage_shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, hugepage_shift); - if (!ptep) - return __pte(0); - if (hugepage_shift) - *pte_sizep = 1ul hugepage_shift; - else - *pte_sizep = PAGE_SIZE; - if (ps *pte_sizep) - return __pte(0); - return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift); -} - static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) { asm volatile(PPC_RELEASE_BARRIER : : : memory); @@ -166,10 +147,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, struct revmap_entry *rev; unsigned long g_ptel; struct kvm_memory_slot *memslot; - unsigned long pte_size; + unsigned hpage_shift; unsigned long is_io; unsigned long *rmap; - pte_t pte; + pte_t *ptep; unsigned int writing; unsigned long mmu_seq; unsigned long rcbits; @@ -208,22 +189,33 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); + ptep = find_linux_pte_or_hugepte(pgdir, hva, hpage_shift); + if (ptep) { + pte_t pte; + unsigned int host_pte_size; - /* Look up the Linux PTE for the backing page */ - pte_size = psize; - pte = lookup_linux_pte_and_update(pgdir, hva, writing, pte_size); - if (pte_present(pte) !pte_protnone(pte)) { - if (writing !pte_write(pte)) - /* make the actual HPTE be read-only */ - ptel = hpte_make_readonly(ptel); - is_io = hpte_cache_bits(pte_val(pte)); - pa = pte_pfn(pte) PAGE_SHIFT; - pa |= hva (pte_size - 1); - pa |= gpa ~PAGE_MASK; - } + if (hpage_shift) + host_pte_size = 1ul hpage_shift; + else + host_pte_size = PAGE_SIZE; + /* +* We should always find the guest page size +* to = host page size, if host is using hugepage +*/ + if (host_pte_size psize) + return H_PARAMETER; - if (pte_size psize) - return H_PARAMETER; + pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift); + if (pte_present(pte) !pte_protnone(pte)) { + if (writing !pte_write(pte)) + /* make the actual HPTE be read-only */ + ptel = hpte_make_readonly(ptel); + is_io = hpte_cache_bits(pte_val(pte)); + pa = pte_pfn(pte) PAGE_SHIFT; + pa |= hva (host_pte_size - 1); + pa |= gpa
[PATCH 2/2] KVM: PPC: Remove page table walk helpers
This patch remove helpers which we had used only once in the code. Limiting page table walk variants help in ensuring that we won't end up with code walking page table with wrong assumptions. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/pgtable.h | 21 - arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 - arch/powerpc/kvm/e500_mmu_host.c| 2 +- 3 files changed, 28 insertions(+), 57 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 9835ac4173b7..92fe01c355a9 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -249,27 +249,6 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); - -static inline pte_t *lookup_linux_ptep(pgd_t *pgdir, unsigned long hva, -unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); - if (!ptep) - return NULL; - if (shift) - *pte_sizep = 1ul shift; - else - *pte_sizep = PAGE_SIZE; - - if (ps *pte_sizep) - return NULL; - - return ptep; -} #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 625407e4d3b0..73e083cb9f7e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -131,25 +131,6 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index, unlock_rmap(rmap); } -static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva, - int writing, unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int hugepage_shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, hugepage_shift); - if (!ptep) - return __pte(0); - if (hugepage_shift) - *pte_sizep = 1ul hugepage_shift; - else - *pte_sizep = PAGE_SIZE; - if (ps *pte_sizep) - return __pte(0); - return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift); -} - static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) { asm volatile(PPC_RELEASE_BARRIER : : : memory); @@ -166,10 +147,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, struct revmap_entry *rev; unsigned long g_ptel; struct kvm_memory_slot *memslot; - unsigned long pte_size; + unsigned hpage_shift; unsigned long is_io; unsigned long *rmap; - pte_t pte; + pte_t *ptep; unsigned int writing; unsigned long mmu_seq; unsigned long rcbits; @@ -208,22 +189,33 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); + ptep = find_linux_pte_or_hugepte(pgdir, hva, hpage_shift); + if (ptep) { + pte_t pte; + unsigned int host_pte_size; - /* Look up the Linux PTE for the backing page */ - pte_size = psize; - pte = lookup_linux_pte_and_update(pgdir, hva, writing, pte_size); - if (pte_present(pte) !pte_protnone(pte)) { - if (writing !pte_write(pte)) - /* make the actual HPTE be read-only */ - ptel = hpte_make_readonly(ptel); - is_io = hpte_cache_bits(pte_val(pte)); - pa = pte_pfn(pte) PAGE_SHIFT; - pa |= hva (pte_size - 1); - pa |= gpa ~PAGE_MASK; - } + if (hpage_shift) + host_pte_size = 1ul hpage_shift; + else + host_pte_size = PAGE_SIZE; + /* +* We should always find the guest page size +* to = host page size, if host is using hugepage +*/ + if (host_pte_size psize) + return H_PARAMETER; - if (pte_size psize) - return H_PARAMETER; + pte = kvmppc_read_update_linux_pte(ptep, writing, hpage_shift); + if (pte_present(pte) !pte_protnone(pte)) { + if (writing !pte_write(pte)) + /* make the actual HPTE be read-only */ + ptel = hpte_make_readonly(ptel); + is_io = hpte_cache_bits(pte_val(pte)); + pa = pte_pfn(pte) PAGE_SHIFT; + pa |= hva (host_pte_size - 1); + pa |= gpa
[PATCH] KVM: PPC: BOOK3S: HV: remove rma related variables from code.
We don't support real-mode areas now that 970 support is removed. Remove the remaining details of rma from the code. Also rename rma_setup_done to hpte_setup_done to better reflect the changes. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_host.h | 3 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++-- arch/powerpc/kvm/book3s_hv.c| 10 +- 3 files changed, 20 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 7efd666a3fa7..833486a5734a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -227,9 +227,8 @@ struct kvm_arch { int tlbie_lock; unsigned long lpcr; unsigned long rmor; - struct kvm_rma_info *rma; unsigned long vrma_slb_v; - int rma_setup_done; + int hpte_setup_done; u32 hpt_order; atomic_t vcpus_running; u32 online_vcores; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3c6c3d..dbf127168ca4 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) long order; mutex_lock(kvm-lock); - if (kvm-arch.rma_setup_done) { - kvm-arch.rma_setup_done = 0; - /* order rma_setup_done vs. vcpus_running */ + if (kvm-arch.hpte_setup_done) { + kvm-arch.hpte_setup_done = 0; + /* order hpte_setup_done vs. vcpus_running */ smp_mb(); if (atomic_read(kvm-arch.vcpus_running)) { - kvm-arch.rma_setup_done = 1; + kvm-arch.hpte_setup_done = 1; goto out; } } @@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, unsigned long tmp[2]; ssize_t nb; long int err, ret; - int rma_setup; + int hpte_setup; if (!access_ok(VERIFY_READ, buf, count)) return -EFAULT; /* lock out vcpus from running while we're doing this */ mutex_lock(kvm-lock); - rma_setup = kvm-arch.rma_setup_done; - if (rma_setup) { - kvm-arch.rma_setup_done = 0; /* temporarily */ - /* order rma_setup_done vs. vcpus_running */ + hpte_setup = kvm-arch.hpte_setup_done; + if (hpte_setup) { + kvm-arch.hpte_setup_done = 0; /* temporarily */ + /* order hpte_setup_done vs. vcpus_running */ smp_mb(); if (atomic_read(kvm-arch.vcpus_running)) { - kvm-arch.rma_setup_done = 1; + kvm-arch.hpte_setup_done = 1; mutex_unlock(kvm-lock); return -EBUSY; } @@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, r=%lx\n, ret, i, v, r); goto out; } - if (!rma_setup is_vrma_hpte(v)) { + if (!hpte_setup is_vrma_hpte(v)) { unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; @@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, (VRMA_VSID SLB_VSID_SHIFT_1T); lpcr = senc (LPCR_VRMASD_SH - 4); kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD); - rma_setup = 1; + hpte_setup = 1; } ++i; hptp += 2; @@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, } out: - /* Order HPTE updates vs. rma_setup_done */ + /* Order HPTE updates vs. hpte_setup_done */ smp_wmb(); - kvm-arch.rma_setup_done = rma_setup; + kvm-arch.hpte_setup_done = hpte_setup; mutex_unlock(kvm-lock); if (err) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index de4018a1bc4b..34e79b8e855c 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2032,11 +2032,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu) } atomic_inc(vcpu-kvm-arch.vcpus_running); - /* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */ + /* Order vcpus_running vs. hpte_setup_done, see kvmppc_alloc_reset_hpt */ smp_mb
[PATCH] KVM: PPC: BOOK3S: HV: remove rma related variables from code.
We don't support real-mode areas now that 970 support is removed. Remove the remaining details of rma from the code. Also rename rma_setup_done to hpte_setup_done to better reflect the changes. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_host.h | 3 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++-- arch/powerpc/kvm/book3s_hv.c| 10 +- 3 files changed, 20 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 7efd666a3fa7..833486a5734a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -227,9 +227,8 @@ struct kvm_arch { int tlbie_lock; unsigned long lpcr; unsigned long rmor; - struct kvm_rma_info *rma; unsigned long vrma_slb_v; - int rma_setup_done; + int hpte_setup_done; u32 hpt_order; atomic_t vcpus_running; u32 online_vcores; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3c6c3d..dbf127168ca4 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) long order; mutex_lock(kvm-lock); - if (kvm-arch.rma_setup_done) { - kvm-arch.rma_setup_done = 0; - /* order rma_setup_done vs. vcpus_running */ + if (kvm-arch.hpte_setup_done) { + kvm-arch.hpte_setup_done = 0; + /* order hpte_setup_done vs. vcpus_running */ smp_mb(); if (atomic_read(kvm-arch.vcpus_running)) { - kvm-arch.rma_setup_done = 1; + kvm-arch.hpte_setup_done = 1; goto out; } } @@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, unsigned long tmp[2]; ssize_t nb; long int err, ret; - int rma_setup; + int hpte_setup; if (!access_ok(VERIFY_READ, buf, count)) return -EFAULT; /* lock out vcpus from running while we're doing this */ mutex_lock(kvm-lock); - rma_setup = kvm-arch.rma_setup_done; - if (rma_setup) { - kvm-arch.rma_setup_done = 0; /* temporarily */ - /* order rma_setup_done vs. vcpus_running */ + hpte_setup = kvm-arch.hpte_setup_done; + if (hpte_setup) { + kvm-arch.hpte_setup_done = 0; /* temporarily */ + /* order hpte_setup_done vs. vcpus_running */ smp_mb(); if (atomic_read(kvm-arch.vcpus_running)) { - kvm-arch.rma_setup_done = 1; + kvm-arch.hpte_setup_done = 1; mutex_unlock(kvm-lock); return -EBUSY; } @@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, r=%lx\n, ret, i, v, r); goto out; } - if (!rma_setup is_vrma_hpte(v)) { + if (!hpte_setup is_vrma_hpte(v)) { unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; @@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, (VRMA_VSID SLB_VSID_SHIFT_1T); lpcr = senc (LPCR_VRMASD_SH - 4); kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD); - rma_setup = 1; + hpte_setup = 1; } ++i; hptp += 2; @@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, } out: - /* Order HPTE updates vs. rma_setup_done */ + /* Order HPTE updates vs. hpte_setup_done */ smp_wmb(); - kvm-arch.rma_setup_done = rma_setup; + kvm-arch.hpte_setup_done = hpte_setup; mutex_unlock(kvm-lock); if (err) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index de4018a1bc4b..34e79b8e855c 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2032,11 +2032,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu) } atomic_inc(vcpu-kvm-arch.vcpus_running); - /* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */ + /* Order vcpus_running vs. hpte_setup_done, see kvmppc_alloc_reset_hpt */ smp_mb
[PATCH V2 2/2] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier
We switch to unlock variant with memory barriers in the error path and also in code path where we had implicit dependency on previous functions calling lwsync/ptesync. In most of the cases we don't really need an explicit barrier, but using the variant make sure we don't make mistakes later with code movements. We also document why a non-barrier variant is ok in performance critical path. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Rebase to latest upstream arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 551dabb9551b..0fd91f54d1a7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -639,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -767,8 +767,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, note_hpte_modification(kvm, rev[i]); } } + unlock_hpte(hptep, be64_to_cpu(hptep[0])); unlock_rmap(rmapp); - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -854,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -971,7 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -994,7 +994,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - __unlock_hpte(hptep, v); + unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 9123132b3053..2e45bd57d4e8 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -268,6 +268,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, pte = be64_to_cpu(hpte[0]); if (!(pte (HPTE_V_VALID | HPTE_V_ABSENT))) break; + /* +* Data dependency will avoid re-ordering +*/ __unlock_hpte(hpte, pte); hpte += 2; } @@ -286,7 +289,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, cpu_relax(); pte = be64_to_cpu(hpte[0]); if (pte (HPTE_V_VALID | HPTE_V_ABSENT)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_PTEG_FULL; } } @@ -406,7 +409,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn) || ((flags H_ANDCOND) (pte avpn) != 0)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -542,7 +545,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) be64_to_cpu(hp[0]), be64_to_cpu(hp[1])); rcbits = rev-guest_rpte (HPTE_R_R|HPTE_R_C); args[j] |= rcbits (56 - 5); - __unlock_hpte(hp, 0); + unlock_hpte(hp, 0); } } @@ -568,7 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, pte = be64_to_cpu(hpte[0]); if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -748,7 +751,9
[PATCH V2 2/2] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier
We switch to unlock variant with memory barriers in the error path and also in code path where we had implicit dependency on previous functions calling lwsync/ptesync. In most of the cases we don't really need an explicit barrier, but using the variant make sure we don't make mistakes later with code movements. We also document why a non-barrier variant is ok in performance critical path. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Rebase to latest upstream arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 551dabb9551b..0fd91f54d1a7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -639,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -767,8 +767,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, note_hpte_modification(kvm, rev[i]); } } + unlock_hpte(hptep, be64_to_cpu(hptep[0])); unlock_rmap(rmapp); - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -854,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -971,7 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -994,7 +994,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - __unlock_hpte(hptep, v); + unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 9123132b3053..2e45bd57d4e8 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -268,6 +268,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, pte = be64_to_cpu(hpte[0]); if (!(pte (HPTE_V_VALID | HPTE_V_ABSENT))) break; + /* +* Data dependency will avoid re-ordering +*/ __unlock_hpte(hpte, pte); hpte += 2; } @@ -286,7 +289,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, cpu_relax(); pte = be64_to_cpu(hpte[0]); if (pte (HPTE_V_VALID | HPTE_V_ABSENT)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_PTEG_FULL; } } @@ -406,7 +409,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn) || ((flags H_ANDCOND) (pte avpn) != 0)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -542,7 +545,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) be64_to_cpu(hp[0]), be64_to_cpu(hp[1])); rcbits = rev-guest_rpte (HPTE_R_R|HPTE_R_C); args[j] |= rcbits (56 - 5); - __unlock_hpte(hp, 0); + unlock_hpte(hp, 0); } } @@ -568,7 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, pte = be64_to_cpu(hpte[0]); if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -748,7 +751,9
[PATCH V2 1/2] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte
This patch adds helper routine for lock and unlock hpte and use the same for rest of the code. We don't change any locking rules in this patch. In the next patch we switch some of the unlock usage to use the api with barrier and also document the usage without barriers. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Rebase to latest upstream arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 3 files changed, 33 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2d81e202bdcc..0789a0f50969 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER : : : memory); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3c6c3d..551dabb9551b 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte-eaddr = eaddr; @@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev-guest_rpte; - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu-arch.pgfault_hpte[0] || @@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) @@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp, r = ~HPTE_GR_MODIFIED; revp-guest_rpte = r; } - asm volatile(PPC_RELEASE_BARRIER : : : memory); - hptp[0
[PATCH V2 1/2] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte
This patch adds helper routine for lock and unlock hpte and use the same for rest of the code. We don't change any locking rules in this patch. In the next patch we switch some of the unlock usage to use the api with barrier and also document the usage without barriers. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Rebase to latest upstream arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 3 files changed, 33 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2d81e202bdcc..0789a0f50969 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER : : : memory); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3c6c3d..551dabb9551b 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte-eaddr = eaddr; @@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev-guest_rpte; - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu-arch.pgfault_hpte[0] || @@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) @@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp, r = ~HPTE_GR_MODIFIED; revp-guest_rpte = r; } - asm volatile(PPC_RELEASE_BARRIER : : : memory); - hptp[0
Re: [PATCH 1/3] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte
Hi, Any update on this patch. We could drop patch 3. Any feedback on 1 and 2 ?. -aneesh Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes: This patch adds helper routine for lock and unlock hpte and use the same for rest of the code. We don't change any locking rules in this patch. In the next patch we switch some of the unlock usage to use the api with barrier and also document the usage without barriers. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 27 ++- 3 files changed, 34 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa817933e6a..ec9fb6085843 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -86,6 +86,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER : : : memory); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index cebb86bc4a37..5ea4b2b6a157 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -475,9 +475,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte-eaddr = eaddr; @@ -606,8 +604,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev-guest_rpte; - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu-arch.pgfault_hpte[0] || @@ -758,7 +755,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) @@ -777,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -907,7 +904,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -995,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1118,8 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } /* need to make it temporarily absent so C is stable */ @@ -1139,9 +1135,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1379,8 +1375,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp
Re: [PATCH 1/3] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte
Hi, Any update on this patch. We could drop patch 3. Any feedback on 1 and 2 ?. -aneesh Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes: This patch adds helper routine for lock and unlock hpte and use the same for rest of the code. We don't change any locking rules in this patch. In the next patch we switch some of the unlock usage to use the api with barrier and also document the usage without barriers. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 27 ++- 3 files changed, 34 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa817933e6a..ec9fb6085843 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -86,6 +86,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER : : : memory); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index cebb86bc4a37..5ea4b2b6a157 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -475,9 +475,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte-eaddr = eaddr; @@ -606,8 +604,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev-guest_rpte; - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu-arch.pgfault_hpte[0] || @@ -758,7 +755,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) @@ -777,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -907,7 +904,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -995,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1118,8 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } /* need to make it temporarily absent so C is stable */ @@ -1139,9 +1135,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1379,8 +1375,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
Suresh E. Warrier warr...@linux.vnet.ibm.com writes: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + + trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
Suresh E. Warrier warr...@linux.vnet.ibm.com writes: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + + trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] KVM: PPC: Book3S HV: Fix computation of tlbie operand
Paul Mackerras pau...@samba.org writes: The B (segment size) field in the RB operand for the tlbie instruction is two bits, which we get from the top two bits of the first doubleword of the HPT entry to be invalidated. These bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM bit numbering). The compute_tlbie_rb() function gets these bits as v (62 - 8), which is not correct as it will bring in the top 10 bits, not just the top two. These extra bits could corrupt the AP, AVAL and L fields in the RB value. To fix this we shift right 62 bits and then shift left 8 bits, so we only get the two bits of the B field. Good catch. The first doubleword of the HPT entry is under the control of the guest kernel. In fact, Linux guests will always put zeroes in bits 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing this. Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa8179..a37f1a4 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -148,7 +148,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ - rb |= v (62 - 8);/* B field */ + rb |= (v HPTE_V_SSIZE_SHIFT) 8; /* B field */ /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] KVM: PPC: Book3S HV: Fix computation of tlbie operand
Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes: Paul Mackerras pau...@samba.org writes: The B (segment size) field in the RB operand for the tlbie instruction is two bits, which we get from the top two bits of the first doubleword of the HPT entry to be invalidated. These bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM bit numbering). The compute_tlbie_rb() function gets these bits as v (62 - 8), which is not correct as it will bring in the top 10 bits, not just the top two. These extra bits could corrupt the AP, AVAL and L fields in the RB value. To fix this we shift right 62 bits and then shift left 8 bits, so we only get the two bits of the B field. Good catch. The first doubleword of the HPT entry is under the control of the guest kernel. In fact, Linux guests will always put zeroes in bits 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing this. Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa8179..a37f1a4 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -148,7 +148,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ -rb |= v (62 - 8);/* B field */ +rb |= (v HPTE_V_SSIZE_SHIFT) 8; /* B field */ or should we do. I guess the below is more closer to what we have in rest of the code ? rb |= ((v (HPTE_V_SSIZE_SHIFT - 8)) ~0xffUL); /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] KVM: PPC: Book3S HV: Fix computation of tlbie operand
Paul Mackerras pau...@samba.org writes: The B (segment size) field in the RB operand for the tlbie instruction is two bits, which we get from the top two bits of the first doubleword of the HPT entry to be invalidated. These bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM bit numbering). The compute_tlbie_rb() function gets these bits as v (62 - 8), which is not correct as it will bring in the top 10 bits, not just the top two. These extra bits could corrupt the AP, AVAL and L fields in the RB value. To fix this we shift right 62 bits and then shift left 8 bits, so we only get the two bits of the B field. Good catch. The first doubleword of the HPT entry is under the control of the guest kernel. In fact, Linux guests will always put zeroes in bits 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing this. Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa8179..a37f1a4 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -148,7 +148,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ - rb |= v (62 - 8);/* B field */ + rb |= (v HPTE_V_SSIZE_SHIFT) 8; /* B field */ /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] KVM: PPC: Book3S HV: Fix computation of tlbie operand
Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes: Paul Mackerras pau...@samba.org writes: The B (segment size) field in the RB operand for the tlbie instruction is two bits, which we get from the top two bits of the first doubleword of the HPT entry to be invalidated. These bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM bit numbering). The compute_tlbie_rb() function gets these bits as v (62 - 8), which is not correct as it will bring in the top 10 bits, not just the top two. These extra bits could corrupt the AP, AVAL and L fields in the RB value. To fix this we shift right 62 bits and then shift left 8 bits, so we only get the two bits of the B field. Good catch. The first doubleword of the HPT entry is under the control of the guest kernel. In fact, Linux guests will always put zeroes in bits 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing this. Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa8179..a37f1a4 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -148,7 +148,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ -rb |= v (62 - 8);/* B field */ +rb |= (v HPTE_V_SSIZE_SHIFT) 8; /* B field */ or should we do. I guess the below is more closer to what we have in rest of the code ? rb |= ((v (HPTE_V_SSIZE_SHIFT - 8)) ~0xffUL); /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock
In kvm_test_clear_dirty_npages(), if we find an invalid HPTE we move on to the next HPTE without unlocking the invalid one. In fact we should never find an invalid and unlocked HPTE in the rmap chain, but for robustness we should unlock it. This adds the missing unlock. Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index d40770248b6a..cebb86bc4a37 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1117,9 +1117,11 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } /* Now check and modify the HPTE */ - if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) + if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { + /* unlock and continue */ + hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); continue; - + } /* need to make it temporarily absent so C is stable */ hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); kvmppc_invalidate_hpte(kvm, hptep, i); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier
We switch to unlock variant with memory barriers in the error path and also in code path where we had implicit dependency on previous functions calling lwsync/ptesync. In most of the cases we don't really need an explicit barrier, but using the variant make sure we don't make mistakes later with code movements. We also document why a non-barrier variant is ok in performance critical path. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 5ea4b2b6a157..c97690ffb5f6 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -774,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -903,8 +903,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, note_hpte_modification(kvm, rev[i]); } } + unlock_hpte(hptep, be64_to_cpu(hptep[0])); unlock_rmap(rmapp); - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -992,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1115,7 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } /* need to make it temporarily absent so C is stable */ @@ -1137,7 +1137,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - __unlock_hpte(hptep, v); + unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 769a5d4c0430..78e689b066f1 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -292,6 +292,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, pte = be64_to_cpu(hpte[0]); if (!(pte (HPTE_V_VALID | HPTE_V_ABSENT))) break; + /* +* Data dependency will avoid re-ordering +*/ __unlock_hpte(hpte, pte); hpte += 2; } @@ -310,7 +313,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, cpu_relax(); pte = be64_to_cpu(hpte[0]); if (pte (HPTE_V_VALID | HPTE_V_ABSENT)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_PTEG_FULL; } } @@ -481,7 +484,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn) || ((flags H_ANDCOND) (pte avpn) != 0)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -617,7 +620,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) be64_to_cpu(hp[0]), be64_to_cpu(hp[1])); rcbits = rev-guest_rpte (HPTE_R_R|HPTE_R_C); args[j] |= rcbits (56 - 5); - __unlock_hpte(hp, 0); + unlock_hpte(hp, 0); } } @@ -643,7 +646,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, pte = be64_to_cpu(hpte[0]); if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND
[PATCH 3/3] KVM: PPC: BOOK3S: HV: Rename variable for better readability
Minor cleanup Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 78e689b066f1..2922f8d127ff 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -523,7 +523,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) unsigned long *args = vcpu-arch.gpr[4]; __be64 *hp, *hptes[4]; unsigned long tlbrb[4]; - long int i, j, k, n, found, indexes[4]; + long int i, j, k, collected_hpte, found, indexes[4]; unsigned long flags, req, pte_index, rcbits; int global; long int ret = H_SUCCESS; @@ -532,7 +532,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) global = global_invalidates(kvm, 0); for (i = 0; i 4 ret == H_SUCCESS; ) { - n = 0; + collected_hpte = 0; for (; i 4; ++i) { j = i * 2; pte_index = args[j]; @@ -554,7 +554,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) hp = (__be64 *) (kvm-arch.hpt_virt + (pte_index 4)); /* to avoid deadlock, don't spin except for first */ if (!try_lock_hpte(hp, HPTE_V_HVLOCK)) { - if (n) + if (collected_hpte) break; while (!try_lock_hpte(hp, HPTE_V_HVLOCK)) cpu_relax(); @@ -596,22 +596,23 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) /* leave it locked */ hp[0] = ~cpu_to_be64(HPTE_V_VALID); - tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]), - be64_to_cpu(hp[1]), pte_index); - indexes[n] = j; - hptes[n] = hp; - revs[n] = rev; - ++n; + tlbrb[collected_hpte] = compute_tlbie_rb(be64_to_cpu(hp[0]), + be64_to_cpu(hp[1]), +pte_index); + indexes[collected_hpte] = j; + hptes[collected_hpte] = hp; + revs[collected_hpte] = rev; + ++collected_hpte; } - if (!n) + if (!collected_hpte) break; /* Now that we've collected a batch, do the tlbies */ - do_tlbies(kvm, tlbrb, n, global, true); + do_tlbies(kvm, tlbrb, collected_hpte, global, true); /* Read PTE low words after tlbie to get final R/C values */ - for (k = 0; k n; ++k) { + for (k = 0; k collected_hpte; ++k) { j = indexes[k]; pte_index = args[j] ((1ul 56) - 1); hp = hptes[k]; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte
This patch adds helper routine for lock and unlock hpte and use the same for rest of the code. We don't change any locking rules in this patch. In the next patch we switch some of the unlock usage to use the api with barrier and also document the usage without barriers. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 27 ++- 3 files changed, 34 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa817933e6a..ec9fb6085843 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -86,6 +86,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER : : : memory); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index cebb86bc4a37..5ea4b2b6a157 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -475,9 +475,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte-eaddr = eaddr; @@ -606,8 +604,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev-guest_rpte; - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu-arch.pgfault_hpte[0] || @@ -758,7 +755,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) @@ -777,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -907,7 +904,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -995,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1118,8 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } /* need to make it temporarily absent so C is stable */ @@ -1139,9 +1135,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1379,8 +1375,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp, r = ~HPTE_GR_MODIFIED; revp-guest_rpte = r; } - asm volatile(PPC_RELEASE_BARRIER
[PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock
In kvm_test_clear_dirty_npages(), if we find an invalid HPTE we move on to the next HPTE without unlocking the invalid one. In fact we should never find an invalid and unlocked HPTE in the rmap chain, but for robustness we should unlock it. This adds the missing unlock. Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index d40770248b6a..cebb86bc4a37 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1117,9 +1117,11 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } /* Now check and modify the HPTE */ - if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) + if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { + /* unlock and continue */ + hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); continue; - + } /* need to make it temporarily absent so C is stable */ hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); kvmppc_invalidate_hpte(kvm, hptep, i); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: PPC: BOOK3S: HV: Rename variable for better readability
Minor cleanup Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 78e689b066f1..2922f8d127ff 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -523,7 +523,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) unsigned long *args = vcpu-arch.gpr[4]; __be64 *hp, *hptes[4]; unsigned long tlbrb[4]; - long int i, j, k, n, found, indexes[4]; + long int i, j, k, collected_hpte, found, indexes[4]; unsigned long flags, req, pte_index, rcbits; int global; long int ret = H_SUCCESS; @@ -532,7 +532,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) global = global_invalidates(kvm, 0); for (i = 0; i 4 ret == H_SUCCESS; ) { - n = 0; + collected_hpte = 0; for (; i 4; ++i) { j = i * 2; pte_index = args[j]; @@ -554,7 +554,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) hp = (__be64 *) (kvm-arch.hpt_virt + (pte_index 4)); /* to avoid deadlock, don't spin except for first */ if (!try_lock_hpte(hp, HPTE_V_HVLOCK)) { - if (n) + if (collected_hpte) break; while (!try_lock_hpte(hp, HPTE_V_HVLOCK)) cpu_relax(); @@ -596,22 +596,23 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) /* leave it locked */ hp[0] = ~cpu_to_be64(HPTE_V_VALID); - tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]), - be64_to_cpu(hp[1]), pte_index); - indexes[n] = j; - hptes[n] = hp; - revs[n] = rev; - ++n; + tlbrb[collected_hpte] = compute_tlbie_rb(be64_to_cpu(hp[0]), + be64_to_cpu(hp[1]), +pte_index); + indexes[collected_hpte] = j; + hptes[collected_hpte] = hp; + revs[collected_hpte] = rev; + ++collected_hpte; } - if (!n) + if (!collected_hpte) break; /* Now that we've collected a batch, do the tlbies */ - do_tlbies(kvm, tlbrb, n, global, true); + do_tlbies(kvm, tlbrb, collected_hpte, global, true); /* Read PTE low words after tlbie to get final R/C values */ - for (k = 0; k n; ++k) { + for (k = 0; k collected_hpte; ++k) { j = indexes[k]; pte_index = args[j] ((1ul 56) - 1); hp = hptes[k]; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier
We switch to unlock variant with memory barriers in the error path and also in code path where we had implicit dependency on previous functions calling lwsync/ptesync. In most of the cases we don't really need an explicit barrier, but using the variant make sure we don't make mistakes later with code movements. We also document why a non-barrier variant is ok in performance critical path. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 5ea4b2b6a157..c97690ffb5f6 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -774,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -903,8 +903,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, note_hpte_modification(kvm, rev[i]); } } + unlock_hpte(hptep, be64_to_cpu(hptep[0])); unlock_rmap(rmapp); - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -992,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1115,7 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } /* need to make it temporarily absent so C is stable */ @@ -1137,7 +1137,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - __unlock_hpte(hptep, v); + unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 769a5d4c0430..78e689b066f1 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -292,6 +292,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, pte = be64_to_cpu(hpte[0]); if (!(pte (HPTE_V_VALID | HPTE_V_ABSENT))) break; + /* +* Data dependency will avoid re-ordering +*/ __unlock_hpte(hpte, pte); hpte += 2; } @@ -310,7 +313,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, cpu_relax(); pte = be64_to_cpu(hpte[0]); if (pte (HPTE_V_VALID | HPTE_V_ABSENT)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_PTEG_FULL; } } @@ -481,7 +484,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn) || ((flags H_ANDCOND) (pte avpn) != 0)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -617,7 +620,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) be64_to_cpu(hp[0]), be64_to_cpu(hp[1])); rcbits = rev-guest_rpte (HPTE_R_R|HPTE_R_C); args[j] |= rcbits (56 - 5); - __unlock_hpte(hp, 0); + unlock_hpte(hp, 0); } } @@ -643,7 +646,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, pte = be64_to_cpu(hpte[0]); if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND
[PATCH 1/3] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte
This patch adds helper routine for lock and unlock hpte and use the same for rest of the code. We don't change any locking rules in this patch. In the next patch we switch some of the unlock usage to use the api with barrier and also document the usage without barriers. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 27 ++- 3 files changed, 34 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa817933e6a..ec9fb6085843 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -86,6 +86,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER : : : memory); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index cebb86bc4a37..5ea4b2b6a157 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -475,9 +475,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte-eaddr = eaddr; @@ -606,8 +604,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev-guest_rpte; - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu-arch.pgfault_hpte[0] || @@ -758,7 +755,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) @@ -777,7 +774,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -907,7 +904,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -995,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1118,8 +1115,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } /* need to make it temporarily absent so C is stable */ @@ -1139,9 +1135,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1379,8 +1375,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp, r = ~HPTE_GR_MODIFIED; revp-guest_rpte = r; } - asm volatile(PPC_RELEASE_BARRIER
[PATCH] KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
We use cma reserved area for creating guest hash page table. Don't do the reservation in non-hypervisor mode. This avoids unnecessary CMA reservation when booting with limited memory configs like fadump and kdump. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index b9615ba5b083..4fdc27c80f4c 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -163,6 +163,12 @@ void __init kvm_cma_reserve(void) unsigned long align_size; struct memblock_region *reg; phys_addr_t selected_size = 0; + + /* +* We need CMA reservation only when we are in HV mode +*/ + if (!cpu_has_feature(CPU_FTR_HVMODE)) + return; /* * We cannot use memblock_phys_mem_size() here, because * memblock_analyze() has not been called yet. -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] PC, KVM, CMA: Fix regression caused by wrong get_order() use
Alexey Kardashevskiy a...@ozlabs.ru writes: fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with ilog2(). Should we round it up ? ie, ilog2(kvm_rma_pages - 1) + 1 ? Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: sta...@vger.kernel.org Why stable ? We merged it this merge window. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..bfe9f01 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, ilog2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(ilog2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, ilog2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use
Alexey Kardashevskiy a...@ozlabs.ru writes: fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with order_base_2() (round-up version of ilog2). Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes: v2: * s/ilog2/order_base_2/ * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is broken --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..b9615ba 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(order_base_2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] PC, KVM, CMA: Fix regression caused by wrong get_order() use
Alexey Kardashevskiy a...@ozlabs.ru writes: fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with ilog2(). Should we round it up ? ie, ilog2(kvm_rma_pages - 1) + 1 ? Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: sta...@vger.kernel.org Why stable ? We merged it this merge window. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..bfe9f01 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, ilog2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(ilog2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, ilog2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use
Alexey Kardashevskiy a...@ozlabs.ru writes: fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called ilog2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with order_base_2() (round-up version of ilog2). Suggested-by: Paul Mackerras pau...@samba.org Cc: Alexander Graf ag...@suse.de Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes: v2: * s/ilog2/order_base_2/ * removed cc: sta...@vger.kernel.org as I got wrong impression that v3.16 is broken --- arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 329d7fd..b9615ba 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -101,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); + page = cma_alloc(kvm_cma, kvm_rma_pages, order_base_2(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -135,12 +135,12 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; - VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + VM_BUG_ON(order_base_2(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); + return cma_alloc(kvm_cma, nr_pages, order_base_2(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
Paul Mackerras pau...@samba.org writes: On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote: We want to use virtual page class key protection mechanism for indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out in the host. Those hptes will be marked valid, but have virtual page class key set to 30 or 31. These virtual page class numbers are configured in AMR to deny read/write. To accomodate such a change, add new functions that map, unmap and check whether a hpte is mapped in the host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use virtual page class keys. But we want to differentiate in the code where we explicitly check for HPTE_V_VALID with places where we want to check whether the hpte is host mapped. This patch enables a closer review for such a change. [...] /* Check for pending invalidations under the rmap chain lock */ if (kvm-arch.using_mmu_notifiers mmu_notifier_retry(kvm, mmu_seq)) { -/* inval in progress, write a non-present HPTE */ -pteh |= HPTE_V_ABSENT; -pteh = ~HPTE_V_VALID; +/* + * inval in progress in host, write host unmapped pte. + */ +host_unmapped_hpte = 1; This isn't right. We already have HPTE_V_VALID set here, and you now don't clear it here, and it doesn't get cleared by the __kvmppc_unmap_host_hpte() call below either. Ok missed that. Will fix that in the next update. In the earlier version I had kvmppc_unmap_host_hpte always clearing V_VALID. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
Paul Mackerras pau...@samba.org writes: On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote: As per ISA, we first need to mark hpte invalid (V=0) before we update the hpte lower half bits. With virtual page class key protection mechanism we want to send any fault other than key fault to guest directly without searching the hash page table. But then we can get NO_HPTE fault while we are updating the hpte. To track that add a vm specific atomic variable that we check in the fault path to always send the fault to host. [...] @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, r = rcbits | ~(HPTE_R_R | HPTE_R_C); if (be64_to_cpu(hptep[0]) HPTE_V_VALID) { -/* HPTE was previously valid, so we need to invalidate it */ +/* + * If we had mapped this hpte before, we now need to + * invalidate that. + */ unlock_rmap(rmap); -/* Always mark HPTE_V_ABSENT before invalidating */ -kvmppc_unmap_host_hpte(kvm, hptep); kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); +hpte_invalidated = true; So now we're not setting the ABSENT bit before invalidating the HPTE. That means that another guest vcpu could do an H_ENTER which could think that this HPTE is free and use it for another unrelated guest HPTE, which would be bad... But henter looks at HPTE_V_HVLOCK, and we keep that set through out. But I will double the code again to make sure it is safe in the above scenario. @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } -kvmppc_map_host_hpte(kvm, v, r); -hptep[0] = cpu_to_be64(v ~HPTE_V_HVLOCK); +hptep[0] = cpu_to_be64(v ~HPTE_V_LOCK); +atomic_dec(kvm-arch.hpte_update_in_progress); Why are we using LOCK rather than HVLOCK now? (And why didn't you mention this change and its rationale in the patch description?) Sorry, that is a typo. I intend to use HPTE_V_HVLOCK. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
Paul Mackerras pau...@samba.org writes: On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote: This makes it consistent with h_enter where we clear the key bits. We also want to use virtual page class key protection mechanism for indicating host page fault. For that we will be using key class index 30 and 31. So prevent the guest from updating key bits until we add proper support for virtual page class protection mechanism for the guest. This will not have any impact for PAPR linux guest because Linux guest currently don't use virtual page class key protection model As things stand, without this patch series, we do actually have everything we need in the kernel for guests to use virtual page class keys. Arguably we should have a capability to tell userspace how many storage keys the guest can use, but that's the only missing piece as far as I can see. yes. If we add such a capability, I can't see any reason why we should need to disable guest use of storage keys in this patchset. With this patchset, we would need additonal changes to find out whether the key fault happened because of the guest's usage of the key. I was planning to do that as an add-on series to keep the changes in this minimal. Also since linux didn't use keys i was not sure whether guest support of keys is an important item. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
Paul Mackerras pau...@samba.org writes: On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote: We want to use virtual page class key protection mechanism for indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out in the host. Those hptes will be marked valid, but have virtual page class key set to 30 or 31. These virtual page class numbers are configured in AMR to deny read/write. To accomodate such a change, add new functions that map, unmap and check whether a hpte is mapped in the host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use virtual page class keys. But we want to differentiate in the code where we explicitly check for HPTE_V_VALID with places where we want to check whether the hpte is host mapped. This patch enables a closer review for such a change. [...] /* Check for pending invalidations under the rmap chain lock */ if (kvm-arch.using_mmu_notifiers mmu_notifier_retry(kvm, mmu_seq)) { -/* inval in progress, write a non-present HPTE */ -pteh |= HPTE_V_ABSENT; -pteh = ~HPTE_V_VALID; +/* + * inval in progress in host, write host unmapped pte. + */ +host_unmapped_hpte = 1; This isn't right. We already have HPTE_V_VALID set here, and you now don't clear it here, and it doesn't get cleared by the __kvmppc_unmap_host_hpte() call below either. Ok missed that. Will fix that in the next update. In the earlier version I had kvmppc_unmap_host_hpte always clearing V_VALID. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
Paul Mackerras pau...@samba.org writes: On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote: As per ISA, we first need to mark hpte invalid (V=0) before we update the hpte lower half bits. With virtual page class key protection mechanism we want to send any fault other than key fault to guest directly without searching the hash page table. But then we can get NO_HPTE fault while we are updating the hpte. To track that add a vm specific atomic variable that we check in the fault path to always send the fault to host. [...] @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, r = rcbits | ~(HPTE_R_R | HPTE_R_C); if (be64_to_cpu(hptep[0]) HPTE_V_VALID) { -/* HPTE was previously valid, so we need to invalidate it */ +/* + * If we had mapped this hpte before, we now need to + * invalidate that. + */ unlock_rmap(rmap); -/* Always mark HPTE_V_ABSENT before invalidating */ -kvmppc_unmap_host_hpte(kvm, hptep); kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); +hpte_invalidated = true; So now we're not setting the ABSENT bit before invalidating the HPTE. That means that another guest vcpu could do an H_ENTER which could think that this HPTE is free and use it for another unrelated guest HPTE, which would be bad... But henter looks at HPTE_V_HVLOCK, and we keep that set through out. But I will double the code again to make sure it is safe in the above scenario. @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } -kvmppc_map_host_hpte(kvm, v, r); -hptep[0] = cpu_to_be64(v ~HPTE_V_HVLOCK); +hptep[0] = cpu_to_be64(v ~HPTE_V_LOCK); +atomic_dec(kvm-arch.hpte_update_in_progress); Why are we using LOCK rather than HVLOCK now? (And why didn't you mention this change and its rationale in the patch description?) Sorry, that is a typo. I intend to use HPTE_V_HVLOCK. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
Paul Mackerras pau...@samba.org writes: On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote: This makes it consistent with h_enter where we clear the key bits. We also want to use virtual page class key protection mechanism for indicating host page fault. For that we will be using key class index 30 and 31. So prevent the guest from updating key bits until we add proper support for virtual page class protection mechanism for the guest. This will not have any impact for PAPR linux guest because Linux guest currently don't use virtual page class key protection model As things stand, without this patch series, we do actually have everything we need in the kernel for guests to use virtual page class keys. Arguably we should have a capability to tell userspace how many storage keys the guest can use, but that's the only missing piece as far as I can see. yes. If we add such a capability, I can't see any reason why we should need to disable guest use of storage keys in this patchset. With this patchset, we would need additonal changes to find out whether the key fault happened because of the guest's usage of the key. I was planning to do that as an add-on series to keep the changes in this minimal. Also since linux didn't use keys i was not sure whether guest support of keys is an important item. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
This makes it consistent with h_enter where we clear the key bits. We also want to use virtual page class key protection mechanism for indicating host page fault. For that we will be using key class index 30 and 31. So prevent the guest from updating key bits until we add proper support for virtual page class protection mechanism for the guest. This will not have any impact for PAPR linux guest because Linux guest currently don't use virtual page class key protection model Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 157a5f35edfa..f908845f7379 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -658,13 +658,17 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, } v = pte; + /* +* We ignore key bits here. We use class 31 and 30 for +* hypervisor purpose. We still don't track the page +* class seperately. Until then don't allow h_protect +* to change key bits. +*/ bits = (flags 55) HPTE_R_PP0; - bits |= (flags 48) HPTE_R_KEY_HI; - bits |= flags (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO); + bits |= flags (HPTE_R_PP | HPTE_R_N); /* Update guest view of 2nd HPTE dword */ - mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N | - HPTE_R_KEY_HI | HPTE_R_KEY_LO; + mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N; rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); if (rev) { r = (rev-guest_rpte ~mask) | bits; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
We will use this to set HPTE_V_VRMA bit in the later patch. This also make sure we clear the hpte bits only when called via hcall. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 +-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 ++-- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 09a47aeb5b63..1c137f45dd55 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -371,8 +371,6 @@ long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags, if (!psize) return H_PARAMETER; - pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); - /* Find the memslot (if any) for this address */ gpa = (ptel HPTE_R_RPN) ~(psize - 1); gfn = gpa PAGE_SHIFT; @@ -408,6 +406,12 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel) { + /* +* Clear few bits, when called via hcall +*/ + pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); + ptel = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED); + return kvmppc_virtmode_do_h_enter(vcpu-kvm, flags, pte_index, pteh, ptel, vcpu-arch.gpr[4]); } @@ -1560,6 +1564,13 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, if (be64_to_cpu(hptp[0]) (HPTE_V_VALID | HPTE_V_ABSENT)) kvmppc_do_h_remove(kvm, 0, i, 0, tmp); err = -EIO; + /* +* Clear few bits we got via read_htab which we +* don't need to carry forward. +*/ + v = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); + r = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED); + ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r, tmp); if (ret != H_SUCCESS) { diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 084ad54c73cd..157a5f35edfa 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -182,8 +182,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, if (!psize) return H_PARAMETER; writing = hpte_is_writable(ptel); - pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); - ptel = ~HPTE_GR_RESERVED; g_ptel = ptel; /* used later to detect if we might have been invalidated */ @@ -367,6 +365,12 @@ EXPORT_SYMBOL_GPL(kvmppc_do_h_enter); long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel) { + /* +* Clear few bits. when called via hcall. +*/ + pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); + ptel = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED); + return kvmppc_do_h_enter(vcpu-kvm, flags, pte_index, pteh, ptel, vcpu-arch.pgdir, true, vcpu-arch.gpr[4]); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
We want to use virtual page class key protection mechanism for indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out in the host. Those hptes will be marked valid, but have virtual page class key set to 30 or 31. These virtual page class numbers are configured in AMR to deny read/write. To accomodate such a change, add new functions that map, unmap and check whether a hpte is mapped in the host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use virtual page class keys. But we want to differentiate in the code where we explicitly check for HPTE_V_VALID with places where we want to check whether the hpte is host mapped. This patch enables a closer review for such a change. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 36 arch/powerpc/kvm/book3s_64_mmu_hv.c | 24 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 30 ++ 3 files changed, 66 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa817933e6a..da00b1f05ea1 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -400,6 +400,42 @@ static inline int is_vrma_hpte(unsigned long hpte_v) (HPTE_V_1TB_SEG | (VRMA_VSID (40 - 16))); } +static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm, + unsigned long *hpte_v, + unsigned long *hpte_r, + bool mmio) +{ + *hpte_v |= HPTE_V_ABSENT; + if (mmio) + *hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO; +} + +static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep) +{ + /* +* We will never call this for MMIO +*/ + hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); +} + +static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v, + unsigned long *hpte_r) +{ + *hpte_v |= HPTE_V_VALID; + *hpte_v = ~HPTE_V_ABSENT; +} + +static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte) +{ + unsigned long v; + + v = be64_to_cpu(hpte[0]); + if (v HPTE_V_VALID) + return true; + return false; +} + + #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* * Note modification of an HPTE; set the HPTE modified bit diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 590e07b1a43f..8ce5e95613f8 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -752,7 +752,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (be64_to_cpu(hptep[0]) HPTE_V_VALID) { /* HPTE was previously valid, so we need to invalidate it */ unlock_rmap(rmap); - hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + /* Always mark HPTE_V_ABSENT before invalidating */ + kvmppc_unmap_host_hpte(kvm, hptep); kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); @@ -897,11 +898,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, /* Now check and modify the HPTE */ ptel = rev[i].guest_rpte; psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel); - if ((be64_to_cpu(hptep[0]) HPTE_V_VALID) + if (kvmppc_is_host_mapped_hpte(kvm, hptep) hpte_rpn(ptel, psize) == gfn) { if (kvm-arch.using_mmu_notifiers) - hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + kvmppc_unmap_host_hpte(kvm, hptep); kvmppc_invalidate_hpte(kvm, hptep, i); + /* Harvest R and C */ rcbits = be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); *rmapp |= rcbits KVMPPC_RMAP_RC_SHIFT; @@ -990,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } /* Now check and modify the HPTE */ - if ((be64_to_cpu(hptep[0]) HPTE_V_VALID) + if (kvmppc_is_host_mapped_hpte(kvm, hptep) (be64_to_cpu(hptep[1]) HPTE_R_R)) { kvmppc_clear_ref_hpte(kvm, hptep, i); if (!(rev[i].guest_rpte HPTE_R_R)) { @@ -1121,11 +1123,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } /* Now check and modify the HPTE */ - if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) + if (!kvmppc_is_host_mapped_hpte
[PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
Hi, With the current code we do an expensive hash page table lookup on every page fault resulting from a missing hash page table entry. A NO_HPTE page fault can happen due to the below reasons: 1) Missing hash pte as per guest. This should be forwarded to the guest 2) MMIO hash pte. The address against which the load/store is performed should be emulated as a MMIO operation. 3) Missing hash pte because host swapped out the guest page. We want to differentiate (1) from (2) and (3) so that we can speed up page fault due to (1). Optimizing (1) will help in improving the overall performance because that covers a large percentage of the page faults. To achieve the above we use virtual page calss protection mechanism for covering (2) and (3). For both the above case we mark the hpte valid, but associate the page with virtual page class index 30 and 31. The authority mask register is configured such that class index 30 and 31 will have read/write denied. The above change results in a key fault for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest without doing the expensive hash pagetable lookup. For the test below: #include unistd.h #include stdio.h #include stdlib.h #include sys/mman.h #define PAGES (40*1024) int main() { unsigned long size = getpagesize(); unsigned long length = size * PAGES; unsigned long i, j, k = 0; for (j = 0; j 10; j++) { char *c = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); if (c == MAP_FAILED) { perror(mmap); exit(1); } for (i = 0; i length; i += size) c[i] = 0; /* flush hptes */ mprotect(c, length, PROT_WRITE); for (i = 0; i length; i += size) c[i] = 10; mprotect(c, length, PROT_READ); for (i = 0; i length; i += size) k += c[i]; munmap(c, length); } } Without Fix: -- [root@qemu-pr-host ~]# time ./pfault real0m8.438s user0m0.855s sys 0m7.540s [root@qemu-pr-host ~]# With Fix: [root@qemu-pr-host ~]# time ./pfault real0m7.833s user0m0.782s sys 0m7.038s [root@qemu-pr-host ~]# Aneesh Kumar K.V (6): KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect KVM: PPC: BOOK3S: HV: Remove dead code KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio arch/powerpc/include/asm/kvm_book3s_64.h | 97 +- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/asm-offsets.c| 1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 99 -- arch/powerpc/kvm/book3s_hv.c | 1 + arch/powerpc/kvm/book3s_hv_rm_mmu.c | 166 +-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 100 +-- 8 files changed, 371 insertions(+), 95 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
As per ISA, we first need to mark hpte invalid (V=0) before we update the hpte lower half bits. With virtual page class key protection mechanism we want to send any fault other than key fault to guest directly without searching the hash page table. But then we can get NO_HPTE fault while we are updating the hpte. To track that add a vm specific atomic variable that we check in the fault path to always send the fault to host. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 1 + arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kernel/asm-offsets.c| 1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 19 ++ arch/powerpc/kvm/book3s_hv.c | 1 + arch/powerpc/kvm/book3s_hv_rm_mmu.c | 40 +++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 60 +--- 7 files changed, 109 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index da00b1f05ea1..a6bf41865a66 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -416,6 +416,7 @@ static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep) * We will never call this for MMIO */ hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + atomic_dec(kvm-arch.hpte_update_in_progress); } static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v, diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index f9ae69682ce1..0a9ff60fae4c 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -254,6 +254,7 @@ struct kvm_arch { atomic_t hpte_mod_interest; spinlock_t slot_phys_lock; cpumask_t need_tlb_flush; + atomic_t hpte_update_in_progress; struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; int hpt_cma_alloc; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index f5995a912213..54a36110f8f2 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -496,6 +496,7 @@ int main(void) DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr)); DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor)); DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v)); + DEFINE(KVM_HPTE_UPDATE, offsetof(struct kvm, arch.hpte_update_in_progress)); DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr)); DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8ce5e95613f8..cb7a616aacb1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -592,6 +592,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int writing, write_ok; struct vm_area_struct *vma; unsigned long rcbits; + bool hpte_invalidated = false; /* * Real-mode code has already searched the HPT and found the @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, r = rcbits | ~(HPTE_R_R | HPTE_R_C); if (be64_to_cpu(hptep[0]) HPTE_V_VALID) { - /* HPTE was previously valid, so we need to invalidate it */ + /* +* If we had mapped this hpte before, we now need to +* invalidate that. +*/ unlock_rmap(rmap); - /* Always mark HPTE_V_ABSENT before invalidating */ - kvmppc_unmap_host_hpte(kvm, hptep); kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); + hpte_invalidated = true; } else { kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0); } @@ -765,6 +768,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, eieio(); hptep[0] = cpu_to_be64(hpte[0]); asm volatile(ptesync : : : memory); + if (hpte_invalidated) + atomic_dec(kvm-arch.hpte_update_in_progress); + preempt_enable(); if (page hpte_is_writable(r)) SetPageDirty(page); @@ -1128,10 +1134,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* * need to make it temporarily absent so C is stable */ - kvmppc_unmap_host_hpte(kvm, hptep); - kvmppc_invalidate_hpte(kvm, hptep, i); v = be64_to_cpu(hptep[0]); r = be64_to_cpu(hptep
[PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio
With this patch we use AMR class 30 and 31 for indicating a page fault that should be handled by host. This includes the MMIO access and the page fault resulting from guest RAM swapout in the host. This enables us to forward the fault to guest without doing the expensive hash page table search for finding the hpte entry. With this patch, we mark hash pte always valid and use class index 30 and 31 for key based fault. These virtual class index are configured in AMR to deny read/write. Since access class protection mechanism doesn't work with VRMA region, we need to handle them separately. We mark those HPTEs invalid and use the software defined bit, HPTE_V_VRMA, to differentiate them. NOTE: We still need to handle protection fault in host so that a write to KSM shared page is handled in the host. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 80 +++- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 48 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 69 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 52 - 5 files changed, 194 insertions(+), 56 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index a6bf41865a66..4aa9c3601fe8 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -48,7 +48,18 @@ extern unsigned long kvm_rma_pages; * HPTEs. */ #define HPTE_V_HVLOCK 0x40UL -#define HPTE_V_ABSENT 0x20UL +/* + * VRMA mapping + */ +#define HPTE_V_VRMA0x20UL + +#define HPTE_R_HOST_UNMAP_KEY 0x3e00UL +/* + * We use this to differentiate between an MMIO key fault and + * and a key fault resulting from host swapping out the page. + */ +#define HPTE_R_MMIO_UNMAP_KEY 0x3c00UL + /* * We use this bit in the guest_rpte field of the revmap entry @@ -405,35 +416,82 @@ static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm, unsigned long *hpte_r, bool mmio) { - *hpte_v |= HPTE_V_ABSENT; - if (mmio) - *hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO; + /* +* We unmap on host by adding the page to AMR class 31 +* which have both read/write access denied. +* +* For VRMA area we mark them invalid. +* +* If we are not using mmu_notifiers we don't use Access +* class protection. +* +* Since we are not changing the hpt directly we don't +* Worry about update ordering. +*/ + if ((*hpte_v HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers) + *hpte_v = ~HPTE_V_VALID; + else if (!mmio) { + *hpte_r |= HPTE_R_HOST_UNMAP_KEY; + *hpte_v |= HPTE_V_VALID; + } else { + *hpte_r |= HPTE_R_MMIO_UNMAP_KEY; + *hpte_v |= HPTE_V_VALID; + } } static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep) { + unsigned long pte_v, pte_r; + + pte_v = be64_to_cpu(hptep[0]); + pte_r = be64_to_cpu(hptep[1]); /* * We will never call this for MMIO */ - hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + __kvmppc_unmap_host_hpte(kvm, pte_v, pte_r, 0); + hptep[1] = cpu_to_be64(pte_r); + eieio(); + hptep[0] = cpu_to_be64(pte_v); + asm volatile(ptesync : : : memory); + /* +* we have now successfully marked the hpte using key bits +*/ atomic_dec(kvm-arch.hpte_update_in_progress); } static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v, unsigned long *hpte_r) { - *hpte_v |= HPTE_V_VALID; - *hpte_v = ~HPTE_V_ABSENT; + /* +* We will never try to map an MMIO region +*/ + if ((*hpte_v HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers) + *hpte_v |= HPTE_V_VALID; + else { + /* +* When we allow guest keys we should set this with key +* for this page. +*/ + *hpte_r = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO); + } } static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte) { - unsigned long v; + unsigned long v, r; v = be64_to_cpu(hpte[0]); - if (v HPTE_V_VALID) - return true; - return false; + if ((v HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers) + return v HPTE_V_VALID; + + r = be64_to_cpu(hpte[1]); + if (!(v HPTE_V_VALID)) + return false; + if ((r (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) == HPTE_R_HOST_UNMAP_KEY) + return false; + if ((r (HPTE_R_KEY_HI | HPTE_R_KEY_LO
[PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 6 -- arch/powerpc/kvm/book3s_hv.c | 6 -- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 66a0a44b62a8..ca7c1688a7b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, */ /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ + + rb |= v (62 - 8);/* B field */ /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index @@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, { int aval_shift; /* -* remaining 7bits of AVA/LP fields +* remaining bits of AVA/LP fields * Also contain the rr bits of LP */ - rb |= (va_low 0x7f) 16; + rb |= (va_low mmu_psize_defs[b_psize].shift) 0x7ff000; /* * Now clear not needed LP bits based on actual psize */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cbf46eb3f59c..328416f28a55 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps, (*sps)-page_shift = def-shift; (*sps)-slb_enc = def-sllp; (*sps)-enc[0].page_shift = def-shift; - /* -* Only return base page encoding. We don't want to return -* all the supporting pte_enc, because our H_ENTER doesn't -* support MPSS yet. Once they do, we can start passing all -* support pte_enc here -*/ (*sps)-enc[0].pte_enc = def-penc[linux_psize]; /* * Add 16MB MPSS support if host supports it -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] KVM: PPC: BOOK3S: HV: Remove dead code
Since we do don't support virtual page class key protection mechanism in the guest, we should not find a keyfault that needs to be forwarded to the guest. So remove the dead code. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 9 - arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 - 2 files changed, 18 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 1c137f45dd55..590e07b1a43f 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -499,15 +499,6 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, gpte-may_write = hpte_write_permission(pp, key); gpte-may_execute = gpte-may_read !(gr (HPTE_R_N | HPTE_R_G)); - /* Storage key permission check for POWER7 */ - if (data virtmode cpu_has_feature(CPU_FTR_ARCH_206)) { - int amrfield = hpte_get_skey_perm(gr, vcpu-arch.amr); - if (amrfield 1) - gpte-may_read = 0; - if (amrfield 2) - gpte-may_write = 0; - } - /* Get the guest physical address */ gpte-raddr = kvmppc_mmu_get_real_addr(v, gr, eaddr); return 0; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index f908845f7379..1884bff3122c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -925,15 +925,6 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr, return status | DSISR_PROTFAULT; } - /* Check storage key, if applicable */ - if (data (vcpu-arch.shregs.msr MSR_DR)) { - unsigned int perm = hpte_get_skey_perm(gr, vcpu-arch.amr); - if (status DSISR_ISSTORE) - perm = 1; - if (perm 1) - return status | DSISR_KEYFAULT; - } - /* Save HPTE info for virtual-mode handler */ vcpu-arch.pgfault_addr = addr; vcpu-arch.pgfault_index = index; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
Benjamin Herrenschmidt b...@kernel.crashing.org writes: On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote: To achieve the above we use virtual page calss protection mechanism for covering (2) and (3). For both the above case we mark the hpte valid, but associate the page with virtual page class index 30 and 31. The authority mask register is configured such that class index 30 and 31 will have read/write denied. The above change results in a key fault for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest without doing the expensive hash pagetable lookup. So we have a measurable performance benefit (about half a second out of 8). I was able to measure a performance benefit of 2 seconds earlier. But once i get the below patch applied that got reduced. I am trying to find how the patch is helping the performance. May be it is avoiding some unnecessary invalidation ? http://mid.gmane.org/1403876103-32459-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com I also believe the benefit depends on how much impact a hash table lookup have. I did try to access the addresses linearly so that I can make sure we do take a cache miss for hash page table access. but you didn't explain the drawback here which is to essentially make it impossible for guests to exploit virtual page class keys, or did you find a way to still make that possible ? I am now making PROTFAULT to go to host. That means, ksm sharing is represented as read only page and an attempt to write to it will get to host via PROTFAULT. Now with that we can implement keys for guest if we want to. So irrespective of what restrictions guest has put in, if the host swapout the page, we will deny read/write. Now if the key fault need to go to guest, we will find that out looking at the key index. As it-is, it's not a huge issue for Linux but we might have to care with other OSes that do care... Do we have a way in PAPR to signify to the guest that the keys are not available ? Right now Qemu doesn't provide the device tree node ibm,processor-storage-keys. That means guest cannot use keys. So we are good there. If we want to support guest keys, we need to fill that with the value that indicate how many keys can be used for data and instruction. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
We will use this to set HPTE_V_VRMA bit in the later patch. This also make sure we clear the hpte bits only when called via hcall. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 +-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 ++-- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 09a47aeb5b63..1c137f45dd55 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -371,8 +371,6 @@ long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags, if (!psize) return H_PARAMETER; - pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); - /* Find the memslot (if any) for this address */ gpa = (ptel HPTE_R_RPN) ~(psize - 1); gfn = gpa PAGE_SHIFT; @@ -408,6 +406,12 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel) { + /* +* Clear few bits, when called via hcall +*/ + pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); + ptel = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED); + return kvmppc_virtmode_do_h_enter(vcpu-kvm, flags, pte_index, pteh, ptel, vcpu-arch.gpr[4]); } @@ -1560,6 +1564,13 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, if (be64_to_cpu(hptp[0]) (HPTE_V_VALID | HPTE_V_ABSENT)) kvmppc_do_h_remove(kvm, 0, i, 0, tmp); err = -EIO; + /* +* Clear few bits we got via read_htab which we +* don't need to carry forward. +*/ + v = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); + r = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED); + ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r, tmp); if (ret != H_SUCCESS) { diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 084ad54c73cd..157a5f35edfa 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -182,8 +182,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, if (!psize) return H_PARAMETER; writing = hpte_is_writable(ptel); - pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); - ptel = ~HPTE_GR_RESERVED; g_ptel = ptel; /* used later to detect if we might have been invalidated */ @@ -367,6 +365,12 @@ EXPORT_SYMBOL_GPL(kvmppc_do_h_enter); long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel) { + /* +* Clear few bits. when called via hcall. +*/ + pteh = ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID); + ptel = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED); + return kvmppc_do_h_enter(vcpu-kvm, flags, pte_index, pteh, ptel, vcpu-arch.pgdir, true, vcpu-arch.gpr[4]); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
Hi, With the current code we do an expensive hash page table lookup on every page fault resulting from a missing hash page table entry. A NO_HPTE page fault can happen due to the below reasons: 1) Missing hash pte as per guest. This should be forwarded to the guest 2) MMIO hash pte. The address against which the load/store is performed should be emulated as a MMIO operation. 3) Missing hash pte because host swapped out the guest page. We want to differentiate (1) from (2) and (3) so that we can speed up page fault due to (1). Optimizing (1) will help in improving the overall performance because that covers a large percentage of the page faults. To achieve the above we use virtual page calss protection mechanism for covering (2) and (3). For both the above case we mark the hpte valid, but associate the page with virtual page class index 30 and 31. The authority mask register is configured such that class index 30 and 31 will have read/write denied. The above change results in a key fault for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest without doing the expensive hash pagetable lookup. For the test below: #include unistd.h #include stdio.h #include stdlib.h #include sys/mman.h #define PAGES (40*1024) int main() { unsigned long size = getpagesize(); unsigned long length = size * PAGES; unsigned long i, j, k = 0; for (j = 0; j 10; j++) { char *c = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); if (c == MAP_FAILED) { perror(mmap); exit(1); } for (i = 0; i length; i += size) c[i] = 0; /* flush hptes */ mprotect(c, length, PROT_WRITE); for (i = 0; i length; i += size) c[i] = 10; mprotect(c, length, PROT_READ); for (i = 0; i length; i += size) k += c[i]; munmap(c, length); } } Without Fix: -- [root@qemu-pr-host ~]# time ./pfault real0m8.438s user0m0.855s sys 0m7.540s [root@qemu-pr-host ~]# With Fix: [root@qemu-pr-host ~]# time ./pfault real0m7.833s user0m0.782s sys 0m7.038s [root@qemu-pr-host ~]# Aneesh Kumar K.V (6): KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect KVM: PPC: BOOK3S: HV: Remove dead code KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio arch/powerpc/include/asm/kvm_book3s_64.h | 97 +- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/asm-offsets.c| 1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 99 -- arch/powerpc/kvm/book3s_hv.c | 1 + arch/powerpc/kvm/book3s_hv_rm_mmu.c | 166 +-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 100 +-- 8 files changed, 371 insertions(+), 95 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
We want to use virtual page class key protection mechanism for indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out in the host. Those hptes will be marked valid, but have virtual page class key set to 30 or 31. These virtual page class numbers are configured in AMR to deny read/write. To accomodate such a change, add new functions that map, unmap and check whether a hpte is mapped in the host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use virtual page class keys. But we want to differentiate in the code where we explicitly check for HPTE_V_VALID with places where we want to check whether the hpte is host mapped. This patch enables a closer review for such a change. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 36 arch/powerpc/kvm/book3s_64_mmu_hv.c | 24 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 30 ++ 3 files changed, 66 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0aa817933e6a..da00b1f05ea1 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -400,6 +400,42 @@ static inline int is_vrma_hpte(unsigned long hpte_v) (HPTE_V_1TB_SEG | (VRMA_VSID (40 - 16))); } +static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm, + unsigned long *hpte_v, + unsigned long *hpte_r, + bool mmio) +{ + *hpte_v |= HPTE_V_ABSENT; + if (mmio) + *hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO; +} + +static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep) +{ + /* +* We will never call this for MMIO +*/ + hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); +} + +static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v, + unsigned long *hpte_r) +{ + *hpte_v |= HPTE_V_VALID; + *hpte_v = ~HPTE_V_ABSENT; +} + +static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte) +{ + unsigned long v; + + v = be64_to_cpu(hpte[0]); + if (v HPTE_V_VALID) + return true; + return false; +} + + #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* * Note modification of an HPTE; set the HPTE modified bit diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 590e07b1a43f..8ce5e95613f8 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -752,7 +752,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (be64_to_cpu(hptep[0]) HPTE_V_VALID) { /* HPTE was previously valid, so we need to invalidate it */ unlock_rmap(rmap); - hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + /* Always mark HPTE_V_ABSENT before invalidating */ + kvmppc_unmap_host_hpte(kvm, hptep); kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); @@ -897,11 +898,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, /* Now check and modify the HPTE */ ptel = rev[i].guest_rpte; psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel); - if ((be64_to_cpu(hptep[0]) HPTE_V_VALID) + if (kvmppc_is_host_mapped_hpte(kvm, hptep) hpte_rpn(ptel, psize) == gfn) { if (kvm-arch.using_mmu_notifiers) - hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + kvmppc_unmap_host_hpte(kvm, hptep); kvmppc_invalidate_hpte(kvm, hptep, i); + /* Harvest R and C */ rcbits = be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); *rmapp |= rcbits KVMPPC_RMAP_RC_SHIFT; @@ -990,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } /* Now check and modify the HPTE */ - if ((be64_to_cpu(hptep[0]) HPTE_V_VALID) + if (kvmppc_is_host_mapped_hpte(kvm, hptep) (be64_to_cpu(hptep[1]) HPTE_R_R)) { kvmppc_clear_ref_hpte(kvm, hptep, i); if (!(rev[i].guest_rpte HPTE_R_R)) { @@ -1121,11 +1123,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } /* Now check and modify the HPTE */ - if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) + if (!kvmppc_is_host_mapped_hpte
[PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio
With this patch we use AMR class 30 and 31 for indicating a page fault that should be handled by host. This includes the MMIO access and the page fault resulting from guest RAM swapout in the host. This enables us to forward the fault to guest without doing the expensive hash page table search for finding the hpte entry. With this patch, we mark hash pte always valid and use class index 30 and 31 for key based fault. These virtual class index are configured in AMR to deny read/write. Since access class protection mechanism doesn't work with VRMA region, we need to handle them separately. We mark those HPTEs invalid and use the software defined bit, HPTE_V_VRMA, to differentiate them. NOTE: We still need to handle protection fault in host so that a write to KSM shared page is handled in the host. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 80 +++- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 48 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 69 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 52 - 5 files changed, 194 insertions(+), 56 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index a6bf41865a66..4aa9c3601fe8 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -48,7 +48,18 @@ extern unsigned long kvm_rma_pages; * HPTEs. */ #define HPTE_V_HVLOCK 0x40UL -#define HPTE_V_ABSENT 0x20UL +/* + * VRMA mapping + */ +#define HPTE_V_VRMA0x20UL + +#define HPTE_R_HOST_UNMAP_KEY 0x3e00UL +/* + * We use this to differentiate between an MMIO key fault and + * and a key fault resulting from host swapping out the page. + */ +#define HPTE_R_MMIO_UNMAP_KEY 0x3c00UL + /* * We use this bit in the guest_rpte field of the revmap entry @@ -405,35 +416,82 @@ static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm, unsigned long *hpte_r, bool mmio) { - *hpte_v |= HPTE_V_ABSENT; - if (mmio) - *hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO; + /* +* We unmap on host by adding the page to AMR class 31 +* which have both read/write access denied. +* +* For VRMA area we mark them invalid. +* +* If we are not using mmu_notifiers we don't use Access +* class protection. +* +* Since we are not changing the hpt directly we don't +* Worry about update ordering. +*/ + if ((*hpte_v HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers) + *hpte_v = ~HPTE_V_VALID; + else if (!mmio) { + *hpte_r |= HPTE_R_HOST_UNMAP_KEY; + *hpte_v |= HPTE_V_VALID; + } else { + *hpte_r |= HPTE_R_MMIO_UNMAP_KEY; + *hpte_v |= HPTE_V_VALID; + } } static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep) { + unsigned long pte_v, pte_r; + + pte_v = be64_to_cpu(hptep[0]); + pte_r = be64_to_cpu(hptep[1]); /* * We will never call this for MMIO */ - hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + __kvmppc_unmap_host_hpte(kvm, pte_v, pte_r, 0); + hptep[1] = cpu_to_be64(pte_r); + eieio(); + hptep[0] = cpu_to_be64(pte_v); + asm volatile(ptesync : : : memory); + /* +* we have now successfully marked the hpte using key bits +*/ atomic_dec(kvm-arch.hpte_update_in_progress); } static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v, unsigned long *hpte_r) { - *hpte_v |= HPTE_V_VALID; - *hpte_v = ~HPTE_V_ABSENT; + /* +* We will never try to map an MMIO region +*/ + if ((*hpte_v HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers) + *hpte_v |= HPTE_V_VALID; + else { + /* +* When we allow guest keys we should set this with key +* for this page. +*/ + *hpte_r = ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO); + } } static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte) { - unsigned long v; + unsigned long v, r; v = be64_to_cpu(hpte[0]); - if (v HPTE_V_VALID) - return true; - return false; + if ((v HPTE_V_VRMA) || !kvm-arch.using_mmu_notifiers) + return v HPTE_V_VALID; + + r = be64_to_cpu(hpte[1]); + if (!(v HPTE_V_VALID)) + return false; + if ((r (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) == HPTE_R_HOST_UNMAP_KEY) + return false; + if ((r (HPTE_R_KEY_HI | HPTE_R_KEY_LO
[PATCH 3/6] KVM: PPC: BOOK3S: HV: Remove dead code
Since we do don't support virtual page class key protection mechanism in the guest, we should not find a keyfault that needs to be forwarded to the guest. So remove the dead code. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 9 - arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 - 2 files changed, 18 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 1c137f45dd55..590e07b1a43f 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -499,15 +499,6 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, gpte-may_write = hpte_write_permission(pp, key); gpte-may_execute = gpte-may_read !(gr (HPTE_R_N | HPTE_R_G)); - /* Storage key permission check for POWER7 */ - if (data virtmode cpu_has_feature(CPU_FTR_ARCH_206)) { - int amrfield = hpte_get_skey_perm(gr, vcpu-arch.amr); - if (amrfield 1) - gpte-may_read = 0; - if (amrfield 2) - gpte-may_write = 0; - } - /* Get the guest physical address */ gpte-raddr = kvmppc_mmu_get_real_addr(v, gr, eaddr); return 0; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index f908845f7379..1884bff3122c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -925,15 +925,6 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr, return status | DSISR_PROTFAULT; } - /* Check storage key, if applicable */ - if (data (vcpu-arch.shregs.msr MSR_DR)) { - unsigned int perm = hpte_get_skey_perm(gr, vcpu-arch.amr); - if (status DSISR_ISSTORE) - perm = 1; - if (perm 1) - return status | DSISR_KEYFAULT; - } - /* Save HPTE info for virtual-mode handler */ vcpu-arch.pgfault_addr = addr; vcpu-arch.pgfault_index = index; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
This makes it consistent with h_enter where we clear the key bits. We also want to use virtual page class key protection mechanism for indicating host page fault. For that we will be using key class index 30 and 31. So prevent the guest from updating key bits until we add proper support for virtual page class protection mechanism for the guest. This will not have any impact for PAPR linux guest because Linux guest currently don't use virtual page class key protection model Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 157a5f35edfa..f908845f7379 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -658,13 +658,17 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, } v = pte; + /* +* We ignore key bits here. We use class 31 and 30 for +* hypervisor purpose. We still don't track the page +* class seperately. Until then don't allow h_protect +* to change key bits. +*/ bits = (flags 55) HPTE_R_PP0; - bits |= (flags 48) HPTE_R_KEY_HI; - bits |= flags (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO); + bits |= flags (HPTE_R_PP | HPTE_R_N); /* Update guest view of 2nd HPTE dword */ - mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N | - HPTE_R_KEY_HI | HPTE_R_KEY_LO; + mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N; rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); if (rev) { r = (rev-guest_rpte ~mask) | bits; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 6 -- arch/powerpc/kvm/book3s_hv.c | 6 -- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 66a0a44b62a8..ca7c1688a7b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, */ /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ + + rb |= v (62 - 8);/* B field */ /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index @@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, { int aval_shift; /* -* remaining 7bits of AVA/LP fields +* remaining bits of AVA/LP fields * Also contain the rr bits of LP */ - rb |= (va_low 0x7f) 16; + rb |= (va_low mmu_psize_defs[b_psize].shift) 0x7ff000; /* * Now clear not needed LP bits based on actual psize */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cbf46eb3f59c..328416f28a55 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps, (*sps)-page_shift = def-shift; (*sps)-slb_enc = def-sllp; (*sps)-enc[0].page_shift = def-shift; - /* -* Only return base page encoding. We don't want to return -* all the supporting pte_enc, because our H_ENTER doesn't -* support MPSS yet. Once they do, we can start passing all -* support pte_enc here -*/ (*sps)-enc[0].pte_enc = def-penc[linux_psize]; /* * Add 16MB MPSS support if host supports it -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
As per ISA, we first need to mark hpte invalid (V=0) before we update the hpte lower half bits. With virtual page class key protection mechanism we want to send any fault other than key fault to guest directly without searching the hash page table. But then we can get NO_HPTE fault while we are updating the hpte. To track that add a vm specific atomic variable that we check in the fault path to always send the fault to host. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 1 + arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kernel/asm-offsets.c| 1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 19 ++ arch/powerpc/kvm/book3s_hv.c | 1 + arch/powerpc/kvm/book3s_hv_rm_mmu.c | 40 +++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 60 +--- 7 files changed, 109 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index da00b1f05ea1..a6bf41865a66 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -416,6 +416,7 @@ static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep) * We will never call this for MMIO */ hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + atomic_dec(kvm-arch.hpte_update_in_progress); } static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v, diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index f9ae69682ce1..0a9ff60fae4c 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -254,6 +254,7 @@ struct kvm_arch { atomic_t hpte_mod_interest; spinlock_t slot_phys_lock; cpumask_t need_tlb_flush; + atomic_t hpte_update_in_progress; struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; int hpt_cma_alloc; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index f5995a912213..54a36110f8f2 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -496,6 +496,7 @@ int main(void) DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr)); DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor)); DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v)); + DEFINE(KVM_HPTE_UPDATE, offsetof(struct kvm, arch.hpte_update_in_progress)); DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr)); DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8ce5e95613f8..cb7a616aacb1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -592,6 +592,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int writing, write_ok; struct vm_area_struct *vma; unsigned long rcbits; + bool hpte_invalidated = false; /* * Real-mode code has already searched the HPT and found the @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, r = rcbits | ~(HPTE_R_R | HPTE_R_C); if (be64_to_cpu(hptep[0]) HPTE_V_VALID) { - /* HPTE was previously valid, so we need to invalidate it */ + /* +* If we had mapped this hpte before, we now need to +* invalidate that. +*/ unlock_rmap(rmap); - /* Always mark HPTE_V_ABSENT before invalidating */ - kvmppc_unmap_host_hpte(kvm, hptep); kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); + hpte_invalidated = true; } else { kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0); } @@ -765,6 +768,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, eieio(); hptep[0] = cpu_to_be64(hpte[0]); asm volatile(ptesync : : : memory); + if (hpte_invalidated) + atomic_dec(kvm-arch.hpte_update_in_progress); + preempt_enable(); if (page hpte_is_writable(r)) SetPageDirty(page); @@ -1128,10 +1134,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* * need to make it temporarily absent so C is stable */ - kvmppc_unmap_host_hpte(kvm, hptep); - kvmppc_invalidate_hpte(kvm, hptep, i); v = be64_to_cpu(hptep[0]); r = be64_to_cpu(hptep
[PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 6 -- arch/powerpc/kvm/book3s_hv.c | 6 -- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 66a0a44b62a8..ca7c1688a7b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, */ /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ + + rb |= v (62 - 8);/* B field */ /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index @@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, { int aval_shift; /* -* remaining 7bits of AVA/LP fields +* remaining bits of AVA/LP fields * Also contain the rr bits of LP */ - rb |= (va_low 0x7f) 16; + rb |= (va_low mmu_psize_defs[b_psize].shift) 0x7ff000; /* * Now clear not needed LP bits based on actual psize */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cbf46eb3f59c..328416f28a55 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps, (*sps)-page_shift = def-shift; (*sps)-slb_enc = def-sllp; (*sps)-enc[0].page_shift = def-shift; - /* -* Only return base page encoding. We don't want to return -* all the supporting pte_enc, because our H_ENTER doesn't -* support MPSS yet. Once they do, we can start passing all -* support pte_enc here -*/ (*sps)-enc[0].pte_enc = def-penc[linux_psize]; /* * Add 16MB MPSS support if host supports it -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 6 -- arch/powerpc/kvm/book3s_hv.c | 6 -- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 66a0a44b62a8..ca7c1688a7b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, */ /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ + + rb |= v (62 - 8);/* B field */ /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index @@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, { int aval_shift; /* -* remaining 7bits of AVA/LP fields +* remaining bits of AVA/LP fields * Also contain the rr bits of LP */ - rb |= (va_low 0x7f) 16; + rb |= (va_low mmu_psize_defs[b_psize].shift) 0x7ff000; /* * Now clear not needed LP bits based on actual psize */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cbf46eb3f59c..328416f28a55 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps, (*sps)-page_shift = def-shift; (*sps)-slb_enc = def-sllp; (*sps)-enc[0].page_shift = def-shift; - /* -* Only return base page encoding. We don't want to return -* all the supporting pte_enc, because our H_ENTER doesn't -* support MPSS yet. Once they do, we can start passing all -* support pte_enc here -*/ (*sps)-enc[0].pte_enc = def-penc[linux_psize]; /* * Add 16MB MPSS support if host supports it -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] KVM: PPC: Book3S HV: Fix ABIv2 on LE
Alexander Graf ag...@suse.de writes: We use ABIv2 on Little Endian systems which gets rid of the dotted function names. Branch to the actual functions when we see such a system. Signed-off-by: Alexander Graf ag...@suse.de As per patches sent by anton we don't need this. We can branch to the function rathen than the dot symbol http://article.gmane.org/gmane.linux.ports.ppc.embedded/68925 http://article.gmane.org/gmane.linux.ports.ppc.embedded/71005 -aneesh --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 1a71f60..1ff3ebd 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -36,6 +36,12 @@ #define NAPPING_CEDE 1 #define NAPPING_NOVCPU 2 +#if defined(_CALL_ELF) _CALL_ELF == 2 +#define FUNC(name) name +#else +#define FUNC(name) GLUE(.,name) +#endif + /* * Call kvmppc_hv_entry in real mode. * Must be called with interrupts hard-disabled. @@ -668,9 +674,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) mr r31, r4 addir3, r31, VCPU_FPRS_TM - bl .load_fp_state + bl FUNC(load_fp_state) addir3, r31, VCPU_VRS_TM - bl .load_vr_state + bl FUNC(load_vr_state) mr r4, r31 lwz r7, VCPU_VRSAVE_TM(r4) mtspr SPRN_VRSAVE, r7 @@ -1414,9 +1420,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) /* Save FP/VSX. */ addir3, r9, VCPU_FPRS_TM - bl .store_fp_state + bl FUNC(store_fp_state) addir3, r9, VCPU_VRS_TM - bl .store_vr_state + bl FUNC(store_vr_state) mfspr r6, SPRN_VRSAVE stw r6, VCPU_VRSAVE_TM(r9) 1: @@ -2405,11 +2411,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) mtmsrd r8 isync addir3,r3,VCPU_FPRS - bl .store_fp_state + bl FUNC(store_fp_state) #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION addir3,r31,VCPU_VRS - bl .store_vr_state + bl FUNC(store_vr_state) END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif mfspr r6,SPRN_VRSAVE @@ -2441,11 +2447,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) mtmsrd r8 isync addir3,r4,VCPU_FPRS - bl .load_fp_state + bl FUNC(load_fp_state) #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION addir3,r31,VCPU_VRS - bl .load_vr_state + bl FUNC(load_vr_state) END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif lwz r7,VCPU_VRSAVE(r31) -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework
Joonsoo Kim iamjoonsoo@lge.com writes: On Sat, Jun 14, 2014 at 02:23:59PM +0530, Aneesh Kumar K.V wrote: Joonsoo Kim iamjoonsoo@lge.com writes: Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Acked-by: Michal Nazarewicz min...@mina86.com Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Need this. We may want to keep the VM_BUG_ON by moving KVM_CMA_CHUNK_ORDER around. diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8056107..1932e0e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -37,8 +37,6 @@ #include asm/ppc-opcode.h #include asm/cputable.h -#include book3s_hv_cma.h - /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ #define MAX_LPID_970 63 @@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } kvm-arch.hpt_cma_alloc = 0; - VM_BUG_ON(order KVM_CMA_CHUNK_ORDER); page = kvm_alloc_hpt(1 (order - PAGE_SHIFT)); if (page) { hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); -aneesh Okay. So do you also want this? @@ -131,16 +135,18 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; + VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return kvm_alloc_cma(nr_pages, align_pages); + return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); } That would also work. Thanks -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code
Joonsoo Kim iamjoonsoo@lge.com writes: On Sat, Jun 14, 2014 at 12:55:39PM +0530, Aneesh Kumar K.V wrote: Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. v2: Although this patchset looks very different with v1, the end result, that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7. Patch 1-5 prepare some features to cover ppc kvm's requirements. Patch 6-7 generalize CMA reserved area management code and change users to use it. Patch 8-10 clean-up minor things. I wanted to test the ppc changes and found that the patch series doesn't apply against v3.15 . Do you have a kernel tree which I can clone to test this series ? This is based on linux-next -next-20140610. And my tree is on following link. https://github.com/JoonsooKim/linux/tree/cma-general-v2.0-next-20140610 But, I think I'm late, because you have already added a Tested-by tag. linux-next kexec is broken on ppc64, hence I hand picked few of dependent patches for dma CMA on top of 3.15 and used that for testing. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework
Joonsoo Kim iamjoonsoo@lge.com writes: On Sat, Jun 14, 2014 at 02:23:59PM +0530, Aneesh Kumar K.V wrote: Joonsoo Kim iamjoonsoo@lge.com writes: Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Acked-by: Michal Nazarewicz min...@mina86.com Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Need this. We may want to keep the VM_BUG_ON by moving KVM_CMA_CHUNK_ORDER around. diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8056107..1932e0e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -37,8 +37,6 @@ #include asm/ppc-opcode.h #include asm/cputable.h -#include book3s_hv_cma.h - /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ #define MAX_LPID_970 63 @@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } kvm-arch.hpt_cma_alloc = 0; - VM_BUG_ON(order KVM_CMA_CHUNK_ORDER); page = kvm_alloc_hpt(1 (order - PAGE_SHIFT)); if (page) { hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); -aneesh Okay. So do you also want this? @@ -131,16 +135,18 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) { unsigned long align_pages = HPT_ALIGN_PAGES; + VM_BUG_ON(get_order(nr_pages) KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); + /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return kvm_alloc_cma(nr_pages, align_pages); + return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); } That would also work. Thanks -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code
Joonsoo Kim iamjoonsoo@lge.com writes: On Sat, Jun 14, 2014 at 12:55:39PM +0530, Aneesh Kumar K.V wrote: Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. v2: Although this patchset looks very different with v1, the end result, that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7. Patch 1-5 prepare some features to cover ppc kvm's requirements. Patch 6-7 generalize CMA reserved area management code and change users to use it. Patch 8-10 clean-up minor things. I wanted to test the ppc changes and found that the patch series doesn't apply against v3.15 . Do you have a kernel tree which I can clone to test this series ? This is based on linux-next -next-20140610. And my tree is on following link. https://github.com/JoonsooKim/linux/tree/cma-general-v2.0-next-20140610 But, I think I'm late, because you have already added a Tested-by tag. linux-next kexec is broken on ppc64, hence I hand picked few of dependent patches for dma CMA on top of 3.15 and used that for testing. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
With guests supporting Multiple page size per segment (MPSS), hpte_page_size returns the actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB. Without this patch a hpte lookup can fail since we are comparing wrong page size in kvmppc_hv_find_lock_hpte. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Remove obsolete comment from the code * Update commit message arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 7 ++- 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 34422be566ce..3d0f3fb9c6b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, +bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l LP_SHIFT) ((1 LP_BITS) - 1); @@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul mmu_psize_defs[size].shift; return 1ul mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f53cf2eae36a..7ff45ed27c65 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 87624ab5ba82..d86356bfc970 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -833,13 +833,10 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, r = be64_to_cpu(hpte[i+1]); /* -* Check the HPTE again, including large page size -* Since we don't currently allow any MPSS (mixed -* page-size segment) page sizes, it is sufficient -* to check against the actual page size. +* Check the HPTE again, including base page size */ if ((v valid) (v mask) == val - hpte_page_size(v, r) == (1ul pshift)) + hpte_base_page_size(v, r) == (1ul pshift)) /* Return with the HPTE still locked */ return (hash 3) + (i 1); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] KVM: PPC: Book3S HV: Enable on little endian hosts
Alexander Graf ag...@suse.de writes: So far we've been able to successfully run HV KVM on big endian hosts, but once you dive into little endian land things start to fall apart. This patch set enables HV KVM for little endian hosts. This should be the final piece left missing to get little endian systems fully en par with big endian ones in the KVM world. Can we also use types like __be64 where we need them ? That will also make sure that tools like sparse can catch errors ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
With guests supporting Multiple page size per segment (MPSS), hpte_page_size returns the actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB. Without this patch a hpte lookup can fail since we are comparing wrong page size in kvmppc_hv_find_lock_hpte. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Remove obsolete comment from the code * Update commit message arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 7 ++- 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 34422be566ce..3d0f3fb9c6b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, +bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l LP_SHIFT) ((1 LP_BITS) - 1); @@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul mmu_psize_defs[size].shift; return 1ul mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f53cf2eae36a..7ff45ed27c65 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 87624ab5ba82..d86356bfc970 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -833,13 +833,10 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, r = be64_to_cpu(hpte[i+1]); /* -* Check the HPTE again, including large page size -* Since we don't currently allow any MPSS (mixed -* page-size segment) page sizes, it is sufficient -* to check against the actual page size. +* Check the HPTE again, including base page size */ if ((v valid) (v mask) == val - hpte_page_size(v, r) == (1ul pshift)) + hpte_base_page_size(v, r) == (1ul pshift)) /* Return with the HPTE still locked */ return (hash 3) + (i 1); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] KVM: PPC: Book3S HV: Enable on little endian hosts
Alexander Graf ag...@suse.de writes: So far we've been able to successfully run HV KVM on big endian hosts, but once you dive into little endian land things start to fall apart. This patch set enables HV KVM for little endian hosts. This should be the final piece left missing to get little endian systems fully en par with big endian ones in the KVM world. Can we also use types like __be64 where we need them ? That will also make sure that tools like sparse can catch errors ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 08/10] mm, cma: clean-up cma allocation error path
Joonsoo Kim iamjoonsoo@lge.com writes: We can remove one call sites for clear_cma_bitmap() if we first call it before checking error number. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/mm/cma.c b/mm/cma.c index 1e1b017..01a0713 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -282,11 +282,12 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align) if (ret == 0) { page = pfn_to_page(pfn); break; - } else if (ret != -EBUSY) { - clear_cma_bitmap(cma, pfn, count); - break; } + clear_cma_bitmap(cma, pfn, count); + if (ret != -EBUSY) + break; + pr_debug(%s(): memory range at %p is busy, retrying\n, __func__, pfn_to_page(pfn)); /* try again with a bit different memory target */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 09/10] mm, cma: move output param to the end of param list
Joonsoo Kim iamjoonsoo@lge.com writes: Conventionally, we put output param to the end of param list. cma_declare_contiguous() doesn't look like that, so change it. Additionally, move down cma_areas reference code to the position where it is really needed. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 28ec226..97613ea 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -184,7 +184,7 @@ void __init kvm_cma_reserve(void) align_size = max(kvm_rma_pages PAGE_SHIFT, align_size); cma_declare_contiguous(selected_size, 0, 0, align_size, - KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false); + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, kvm_cma); } } diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index f177f73..bfd4553 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -149,7 +149,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = cma_declare_contiguous(size, base, limit, 0, 0, res_cma, fixed); + ret = cma_declare_contiguous(size, base, limit, 0, 0, fixed, res_cma); if (ret) return ret; diff --git a/include/linux/cma.h b/include/linux/cma.h index e38efe9..e53eead 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -6,7 +6,7 @@ struct cma; extern int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t base, phys_addr_t limit, phys_addr_t alignment, int order_per_bit, - struct cma **res_cma, bool fixed); + bool fixed, struct cma **res_cma); extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align); extern bool cma_release(struct cma *cma, struct page *pages, int count); #endif diff --git a/mm/cma.c b/mm/cma.c index 01a0713..22a5b23 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -142,8 +142,8 @@ core_initcall(cma_init_reserved_areas); * @limit: End address of the reserved memory (optional, 0 for any). * @alignment: Alignment for the contiguous memory area, should be power of 2 * @order_per_bit: Order of pages represented by one bit on bitmap. - * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area + * @res_cma: Pointer to store the created cma region. * * This function reserves memory from early allocator. It should be * called by arch specific code once the early allocator (memblock or bootmem) @@ -156,9 +156,9 @@ core_initcall(cma_init_reserved_areas); int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t base, phys_addr_t limit, phys_addr_t alignment, int order_per_bit, - struct cma **res_cma, bool fixed) + bool fixed, struct cma **res_cma) { - struct cma *cma = cma_areas[cma_area_count]; + struct cma *cma; int ret = 0; pr_debug(%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n, @@ -214,6 +214,7 @@ int __init cma_declare_contiguous(phys_addr_t size, * Each reserved area must be initialised later, when more kernel * subsystems (like slab allocator) are available. */ + cma = cma_areas[cma_area_count]; cma-base_pfn = PFN_DOWN(base); cma-count = size PAGE_SHIFT; cma-order_per_bit = order_per_bit; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code
Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. v2: Although this patchset looks very different with v1, the end result, that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7. Patch 1-5 prepare some features to cover ppc kvm's requirements. Patch 6-7 generalize CMA reserved area management code and change users to use it. Patch 8-10 clean-up minor things. I wanted to test the ppc changes and found that the patch series doesn't apply against v3.15 . Do you have a kernel tree which I can clone to test this series ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework
Joonsoo Kim iamjoonsoo@lge.com writes: Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Acked-by: Michal Nazarewicz min...@mina86.com Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Need this. We may want to keep the VM_BUG_ON by moving KVM_CMA_CHUNK_ORDER around. diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8056107..1932e0e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -37,8 +37,6 @@ #include asm/ppc-opcode.h #include asm/cputable.h -#include book3s_hv_cma.h - /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ #define MAX_LPID_970 63 @@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } kvm-arch.hpt_cma_alloc = 0; - VM_BUG_ON(order KVM_CMA_CHUNK_ORDER); page = kvm_alloc_hpt(1 (order - PAGE_SHIFT)); if (page) { hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework
Joonsoo Kim iamjoonsoo@lge.com writes: Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Acked-by: Michal Nazarewicz min...@mina86.com Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 7cde8a6..28ec226 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -16,12 +16,14 @@ #include linux/init.h #include linux/memblock.h #include linux/sizes.h +#include linux/cma.h #include asm/cputable.h #include asm/kvm_ppc.h #include asm/kvm_book3s.h -#include book3s_hv_cma.h +#define KVM_CMA_CHUNK_ORDER 18 + /* * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206) * should be power of 2. @@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5; unsigned long kvm_rma_pages = (1 27) PAGE_SHIFT; /* 128MB */ EXPORT_SYMBOL_GPL(kvm_rma_pages); +static struct cma *kvm_cma; + /* Work out RMLS (real mode limit selector) field value for a given RMA size. Assumes POWER7 or PPC970. */ static inline int lpcr_rmls(unsigned long rma_size) @@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages); + page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma); void kvm_release_rma(struct kvm_rma_info *ri) { if (atomic_dec_and_test(ri-use_count)) { - kvm_release_cma(pfn_to_page(ri-base_pfn), kvm_rma_pages); + cma_release(kvm_cma, pfn_to_page(ri-base_pfn), kvm_rma_pages); kfree(ri); } } @@ -134,13 +138,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return kvm_alloc_cma(nr_pages, align_pages); + return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); void kvm_release_hpt(struct page *page, unsigned long nr_pages) { - kvm_release_cma(page, nr_pages); + cma_release(kvm_cma, page, nr_pages); } EXPORT_SYMBOL_GPL(kvm_release_hpt); @@ -179,7 +183,8 @@ void __init kvm_cma_reserve(void) align_size = HPT_ALIGN_PAGES PAGE_SHIFT; align_size = max(kvm_rma_pages PAGE_SHIFT, align_size); - kvm_cma_declare_contiguous(selected_size, align_size); + cma_declare_contiguous(selected_size, 0, 0, align_size, + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false); } } diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c deleted file mode 100644 index d9d3d85..000 --- a/arch/powerpc/kvm/book3s_hv_cma.c +++ /dev/null @@ -1,240 +0,0 @@ -/* - * Contiguous Memory Allocator for ppc KVM hash pagetable based on CMA - * for DMA mapping framework - * - * Copyright IBM Corporation, 2013 - * Author Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation; either version 2 of the - * License or (at your optional) any later version of the license. - * - */ -#define pr_fmt(fmt) kvm_cma: fmt - -#ifdef CONFIG_CMA_DEBUG -#ifndef DEBUG -# define DEBUG -#endif -#endif - -#include linux/memblock.h -#include linux/mutex.h -#include linux/sizes.h -#include linux/slab.h - -#include book3s_hv_cma.h - -struct kvm_cma { - unsigned long base_pfn; - unsigned long count; - unsigned long *bitmap; -}; - -static DEFINE_MUTEX(kvm_cma_mutex); -static struct kvm_cma kvm_cma_area; - -/** - * kvm_cma_declare_contiguous() - reserve area for contiguous memory handling - * for kvm hash pagetable - * @size: Size of the reserved memory. - * @alignment: Alignment for the contiguous memory area - * - * This function reserves memory for kvm cma area. It should be - * called by arch code when early allocator (memblock or bootmem) - * is still activate. - */ -long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t alignment) -{ - long base_pfn; - phys_addr_t addr; - struct kvm_cma *cma = kvm_cma_area; - - pr_debug(%s(size %lx)\n, __func__, (unsigned long)size); - - if (!size) - return -EINVAL; - /* - * Sanitise input arguments. - * We should be pageblock aligned for CMA
Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality
Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. In previous patches, we have prepared some features to generalize CMA reserved area management and now it's time to do it. This patch moves core functions to mm/cma.c and change DMA APIs to use these functions. There is no functional change in DMA APIs. v2: There is no big change from v1 in mm/cma.c. Mostly renaming. Acked-by: Michal Nazarewicz min...@mina86.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 00e13ce..4eac559 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -283,16 +283,6 @@ config CMA_ALIGNMENT If unsure, leave the default value 8. -config CMA_AREAS - int Maximum count of the CMA device-private areas - default 7 - help - CMA allows to create CMA areas for particular devices. This parameter - sets the maximum number of such device private CMA areas in the - system. - - If unsure, leave the default value 7. - endif endmenu diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 9bc9340..f177f73 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -24,25 +24,10 @@ #include linux/memblock.h #include linux/err.h -#include linux/mm.h -#include linux/mutex.h -#include linux/page-isolation.h #include linux/sizes.h -#include linux/slab.h -#include linux/swap.h -#include linux/mm_types.h #include linux/dma-contiguous.h #include linux/log2.h - -struct cma { - unsigned long base_pfn; - unsigned long count; - unsigned long *bitmap; - int order_per_bit; /* Order of pages represented by one bit */ - struct mutexlock; -}; - -struct cma *dma_contiguous_default_area; +#include linux/cma.h #ifdef CONFIG_CMA_SIZE_MBYTES #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area; #define CMA_SIZE_MBYTES 0 #endif +struct cma *dma_contiguous_default_area; + /* * Default global CMA area size can be defined in kernel's .config. * This is useful mainly for distro maintainers to create a kernel @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit) } } -static DEFINE_MUTEX(cma_mutex); - -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order) -{ - return (1 (align_order cma-order_per_bit)) - 1; -} - -static unsigned long cma_bitmap_maxno(struct cma *cma) -{ - return cma-count cma-order_per_bit; -} - -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma, - unsigned long pages) -{ - return ALIGN(pages, 1 cma-order_per_bit) cma-order_per_bit; -} - -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) -{ - unsigned long bitmapno, nr_bits; - - bitmapno = (pfn - cma-base_pfn) cma-order_per_bit; - nr_bits = cma_bitmap_pages_to_bits(cma, count); - - mutex_lock(cma-lock); - bitmap_clear(cma-bitmap, bitmapno, nr_bits); - mutex_unlock(cma-lock); -} - -static int __init cma_activate_area(struct cma *cma) -{ - int bitmap_maxno = cma_bitmap_maxno(cma); - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long); - unsigned long base_pfn = cma-base_pfn, pfn = base_pfn; - unsigned i = cma-count pageblock_order; - struct zone *zone; - - pr_debug(%s()\n, __func__); - - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL); - if (!cma-bitmap) - return -ENOMEM; - - WARN_ON_ONCE(!pfn_valid(pfn)); - zone = page_zone(pfn_to_page(pfn)); - - do { - unsigned j; - base_pfn = pfn; - for (j = pageblock_nr_pages; j; --j, pfn++) { - WARN_ON_ONCE(!pfn_valid(pfn)); - /* - * alloc_contig_range requires the pfn range - * specified to be in the same zone. Make this - * simple by forcing
Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality
Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. In previous patches, we have prepared some features to generalize CMA reserved area management and now it's time to do it. This patch moves core functions to mm/cma.c and change DMA APIs to use these functions. There is no functional change in DMA APIs. v2: There is no big change from v1 in mm/cma.c. Mostly renaming. Acked-by: Michal Nazarewicz min...@mina86.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 00e13ce..4eac559 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -283,16 +283,6 @@ config CMA_ALIGNMENT If unsure, leave the default value 8. -config CMA_AREAS - int Maximum count of the CMA device-private areas - default 7 - help - CMA allows to create CMA areas for particular devices. This parameter - sets the maximum number of such device private CMA areas in the - system. - - If unsure, leave the default value 7. - endif endmenu diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 9bc9340..f177f73 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -24,25 +24,10 @@ #include linux/memblock.h #include linux/err.h -#include linux/mm.h -#include linux/mutex.h -#include linux/page-isolation.h #include linux/sizes.h -#include linux/slab.h -#include linux/swap.h -#include linux/mm_types.h #include linux/dma-contiguous.h #include linux/log2.h - -struct cma { - unsigned long base_pfn; - unsigned long count; - unsigned long *bitmap; - int order_per_bit; /* Order of pages represented by one bit */ - struct mutexlock; -}; - -struct cma *dma_contiguous_default_area; +#include linux/cma.h #ifdef CONFIG_CMA_SIZE_MBYTES #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area; #define CMA_SIZE_MBYTES 0 #endif +struct cma *dma_contiguous_default_area; + /* * Default global CMA area size can be defined in kernel's .config. * This is useful mainly for distro maintainers to create a kernel @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit) } } -static DEFINE_MUTEX(cma_mutex); - -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order) -{ - return (1 (align_order cma-order_per_bit)) - 1; -} - -static unsigned long cma_bitmap_maxno(struct cma *cma) -{ - return cma-count cma-order_per_bit; -} - -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma, - unsigned long pages) -{ - return ALIGN(pages, 1 cma-order_per_bit) cma-order_per_bit; -} - -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) -{ - unsigned long bitmapno, nr_bits; - - bitmapno = (pfn - cma-base_pfn) cma-order_per_bit; - nr_bits = cma_bitmap_pages_to_bits(cma, count); - - mutex_lock(cma-lock); - bitmap_clear(cma-bitmap, bitmapno, nr_bits); - mutex_unlock(cma-lock); -} - -static int __init cma_activate_area(struct cma *cma) -{ - int bitmap_maxno = cma_bitmap_maxno(cma); - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long); - unsigned long base_pfn = cma-base_pfn, pfn = base_pfn; - unsigned i = cma-count pageblock_order; - struct zone *zone; - - pr_debug(%s()\n, __func__); - - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL); - if (!cma-bitmap) - return -ENOMEM; - - WARN_ON_ONCE(!pfn_valid(pfn)); - zone = page_zone(pfn_to_page(pfn)); - - do { - unsigned j; - base_pfn = pfn; - for (j = pageblock_nr_pages; j; --j, pfn++) { - WARN_ON_ONCE(!pfn_valid(pfn)); - /* - * alloc_contig_range requires the pfn range - * specified to be in the same zone. Make this - * simple by forcing
Re: [PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity
Joonsoo Kim iamjoonsoo@lge.com writes: ppc kvm's cma region management requires arbitrary bitmap granularity, since they want to reserve very large memory and manage this region with bitmap that one bit for several pages to reduce management overheads. So support arbitrary bitmap granularity for following generalization. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index bc4c171..9bc9340 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -38,6 +38,7 @@ struct cma { unsigned long base_pfn; unsigned long count; unsigned long *bitmap; + int order_per_bit; /* Order of pages represented by one bit */ struct mutexlock; }; @@ -157,9 +158,38 @@ void __init dma_contiguous_reserve(phys_addr_t limit) static DEFINE_MUTEX(cma_mutex); +static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order) +{ + return (1 (align_order cma-order_per_bit)) - 1; +} + +static unsigned long cma_bitmap_maxno(struct cma *cma) +{ + return cma-count cma-order_per_bit; +} + +static unsigned long cma_bitmap_pages_to_bits(struct cma *cma, + unsigned long pages) +{ + return ALIGN(pages, 1 cma-order_per_bit) cma-order_per_bit; +} + +static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) +{ + unsigned long bitmapno, nr_bits; + + bitmapno = (pfn - cma-base_pfn) cma-order_per_bit; + nr_bits = cma_bitmap_pages_to_bits(cma, count); + + mutex_lock(cma-lock); + bitmap_clear(cma-bitmap, bitmapno, nr_bits); + mutex_unlock(cma-lock); +} + static int __init cma_activate_area(struct cma *cma) { - int bitmap_size = BITS_TO_LONGS(cma-count) * sizeof(long); + int bitmap_maxno = cma_bitmap_maxno(cma); + int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long); unsigned long base_pfn = cma-base_pfn, pfn = base_pfn; unsigned i = cma-count pageblock_order; struct zone *zone; @@ -221,6 +251,7 @@ core_initcall(cma_init_reserved_areas); * @base: Base address of the reserved area optional, use 0 for any * @limit: End address of the reserved memory (optional, 0 for any). * @alignment: Alignment for the contiguous memory area, should be power of 2 + * @order_per_bit: Order of pages represented by one bit on bitmap. * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area * @@ -235,7 +266,7 @@ core_initcall(cma_init_reserved_areas); */ static int __init __dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, phys_addr_t limit, - phys_addr_t alignment, + phys_addr_t alignment, int order_per_bit, struct cma **res_cma, bool fixed) { struct cma *cma = cma_areas[cma_area_count]; @@ -269,6 +300,8 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, base = ALIGN(base, alignment); size = ALIGN(size, alignment); limit = ~(alignment - 1); + /* size should be aligned with order_per_bit */ + BUG_ON(!IS_ALIGNED(size PAGE_SHIFT, 1 order_per_bit)); /* Reserve memory */ if (base fixed) { @@ -294,6 +327,7 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, */ cma-base_pfn = PFN_DOWN(base); cma-count = size PAGE_SHIFT; + cma-order_per_bit = order_per_bit; *res_cma = cma; cma_area_count++; @@ -313,7 +347,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = __dma_contiguous_reserve_area(size, base, limit, 0, + ret = __dma_contiguous_reserve_area(size, base, limit, 0, 0, res_cma, fixed); if (ret) return ret; @@ -324,13 +358,6 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, return 0; } -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) -{ - mutex_lock(cma-lock); - bitmap_clear(cma-bitmap, pfn - cma-base_pfn, count); - mutex_unlock(cma-lock); -} - /** * dma_alloc_from_contiguous() - allocate pages from contiguous area * @dev: Pointer to device for which the allocation is performed. @@ -345,7 +372,8 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count, unsigned int align) { - unsigned long mask, pfn, pageno, start = 0; + unsigned long mask, pfn, start = 0; + unsigned long bitmap_maxno, bitmapno, nr_bits; struct page *page
Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality
Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. In previous patches, we have prepared some features to generalize CMA reserved area management and now it's time to do it. This patch moves core functions to mm/cma.c and change DMA APIs to use these functions. There is no functional change in DMA APIs. v2: There is no big change from v1 in mm/cma.c. Mostly renaming. Acked-by: Michal Nazarewicz min...@mina86.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com . + + mask = cma_bitmap_aligned_mask(cma, align); + bitmap_maxno = cma_bitmap_maxno(cma); + nr_bits = cma_bitmap_pages_to_bits(cma, count); + + for (;;) { + mutex_lock(cma-lock); + bitmapno = bitmap_find_next_zero_area(cma-bitmap, + bitmap_maxno, start, nr_bits, mask); + if (bitmapno = bitmap_maxno) { + mutex_unlock(cma-lock); + break; + } + bitmap_set(cma-bitmap, bitmapno, nr_bits); + /* + * It's safe to drop the lock here. We've marked this region for + * our exclusive use. If the migration fails we will take the + * lock again and unmark it. + */ + mutex_unlock(cma-lock); + + pfn = cma-base_pfn + (bitmapno cma-order_per_bit); + mutex_lock(cma_mutex); + ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA); + mutex_unlock(cma_mutex); + if (ret == 0) { + page = pfn_to_page(pfn); + break; + } else if (ret != -EBUSY) { + clear_cma_bitmap(cma, pfn, count); + break; + } + For setting bit map we do bitmap_set(cma-bitmap, bitmapno, nr_bits); alloc_contig().. if (error) clear_cma_bitmap(cma, pfn, count); Why ? why not bitmap_clear() ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 08/10] mm, cma: clean-up cma allocation error path
Joonsoo Kim iamjoonsoo@lge.com writes: We can remove one call sites for clear_cma_bitmap() if we first call it before checking error number. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/mm/cma.c b/mm/cma.c index 1e1b017..01a0713 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -282,11 +282,12 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align) if (ret == 0) { page = pfn_to_page(pfn); break; - } else if (ret != -EBUSY) { - clear_cma_bitmap(cma, pfn, count); - break; } + clear_cma_bitmap(cma, pfn, count); + if (ret != -EBUSY) + break; + pr_debug(%s(): memory range at %p is busy, retrying\n, __func__, pfn_to_page(pfn)); /* try again with a bit different memory target */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 09/10] mm, cma: move output param to the end of param list
Joonsoo Kim iamjoonsoo@lge.com writes: Conventionally, we put output param to the end of param list. cma_declare_contiguous() doesn't look like that, so change it. Additionally, move down cma_areas reference code to the position where it is really needed. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 28ec226..97613ea 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -184,7 +184,7 @@ void __init kvm_cma_reserve(void) align_size = max(kvm_rma_pages PAGE_SHIFT, align_size); cma_declare_contiguous(selected_size, 0, 0, align_size, - KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false); + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, kvm_cma); } } diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index f177f73..bfd4553 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -149,7 +149,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = cma_declare_contiguous(size, base, limit, 0, 0, res_cma, fixed); + ret = cma_declare_contiguous(size, base, limit, 0, 0, fixed, res_cma); if (ret) return ret; diff --git a/include/linux/cma.h b/include/linux/cma.h index e38efe9..e53eead 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -6,7 +6,7 @@ struct cma; extern int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t base, phys_addr_t limit, phys_addr_t alignment, int order_per_bit, - struct cma **res_cma, bool fixed); + bool fixed, struct cma **res_cma); extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align); extern bool cma_release(struct cma *cma, struct page *pages, int count); #endif diff --git a/mm/cma.c b/mm/cma.c index 01a0713..22a5b23 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -142,8 +142,8 @@ core_initcall(cma_init_reserved_areas); * @limit: End address of the reserved memory (optional, 0 for any). * @alignment: Alignment for the contiguous memory area, should be power of 2 * @order_per_bit: Order of pages represented by one bit on bitmap. - * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area + * @res_cma: Pointer to store the created cma region. * * This function reserves memory from early allocator. It should be * called by arch specific code once the early allocator (memblock or bootmem) @@ -156,9 +156,9 @@ core_initcall(cma_init_reserved_areas); int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t base, phys_addr_t limit, phys_addr_t alignment, int order_per_bit, - struct cma **res_cma, bool fixed) + bool fixed, struct cma **res_cma) { - struct cma *cma = cma_areas[cma_area_count]; + struct cma *cma; int ret = 0; pr_debug(%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n, @@ -214,6 +214,7 @@ int __init cma_declare_contiguous(phys_addr_t size, * Each reserved area must be initialised later, when more kernel * subsystems (like slab allocator) are available. */ + cma = cma_areas[cma_area_count]; cma-base_pfn = PFN_DOWN(base); cma-count = size PAGE_SHIFT; cma-order_per_bit = order_per_bit; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/10] CMA: generalize CMA reserved area management code
Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. v2: Although this patchset looks very different with v1, the end result, that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7. Patch 1-5 prepare some features to cover ppc kvm's requirements. Patch 6-7 generalize CMA reserved area management code and change users to use it. Patch 8-10 clean-up minor things. I wanted to test the ppc changes and found that the patch series doesn't apply against v3.15 . Do you have a kernel tree which I can clone to test this series ? -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework
Joonsoo Kim iamjoonsoo@lge.com writes: Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Acked-by: Michal Nazarewicz min...@mina86.com Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Need this. We may want to keep the VM_BUG_ON by moving KVM_CMA_CHUNK_ORDER around. diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8056107..1932e0e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -37,8 +37,6 @@ #include asm/ppc-opcode.h #include asm/cputable.h -#include book3s_hv_cma.h - /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ #define MAX_LPID_970 63 @@ -64,7 +62,6 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } kvm-arch.hpt_cma_alloc = 0; - VM_BUG_ON(order KVM_CMA_CHUNK_ORDER); page = kvm_alloc_hpt(1 (order - PAGE_SHIFT)); if (page) { hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework
Joonsoo Kim iamjoonsoo@lge.com writes: Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Acked-by: Michal Nazarewicz min...@mina86.com Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 7cde8a6..28ec226 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -16,12 +16,14 @@ #include linux/init.h #include linux/memblock.h #include linux/sizes.h +#include linux/cma.h #include asm/cputable.h #include asm/kvm_ppc.h #include asm/kvm_book3s.h -#include book3s_hv_cma.h +#define KVM_CMA_CHUNK_ORDER 18 + /* * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206) * should be power of 2. @@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5; unsigned long kvm_rma_pages = (1 27) PAGE_SHIFT; /* 128MB */ EXPORT_SYMBOL_GPL(kvm_rma_pages); +static struct cma *kvm_cma; + /* Work out RMLS (real mode limit selector) field value for a given RMA size. Assumes POWER7 or PPC970. */ static inline int lpcr_rmls(unsigned long rma_size) @@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages); + page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); if (!page) goto err_out; atomic_set(ri-use_count, 1); @@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma); void kvm_release_rma(struct kvm_rma_info *ri) { if (atomic_dec_and_test(ri-use_count)) { - kvm_release_cma(pfn_to_page(ri-base_pfn), kvm_rma_pages); + cma_release(kvm_cma, pfn_to_page(ri-base_pfn), kvm_rma_pages); kfree(ri); } } @@ -134,13 +138,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return kvm_alloc_cma(nr_pages, align_pages); + return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); void kvm_release_hpt(struct page *page, unsigned long nr_pages) { - kvm_release_cma(page, nr_pages); + cma_release(kvm_cma, page, nr_pages); } EXPORT_SYMBOL_GPL(kvm_release_hpt); @@ -179,7 +183,8 @@ void __init kvm_cma_reserve(void) align_size = HPT_ALIGN_PAGES PAGE_SHIFT; align_size = max(kvm_rma_pages PAGE_SHIFT, align_size); - kvm_cma_declare_contiguous(selected_size, align_size); + cma_declare_contiguous(selected_size, 0, 0, align_size, + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, kvm_cma, false); } } diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c deleted file mode 100644 index d9d3d85..000 --- a/arch/powerpc/kvm/book3s_hv_cma.c +++ /dev/null @@ -1,240 +0,0 @@ -/* - * Contiguous Memory Allocator for ppc KVM hash pagetable based on CMA - * for DMA mapping framework - * - * Copyright IBM Corporation, 2013 - * Author Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation; either version 2 of the - * License or (at your optional) any later version of the license. - * - */ -#define pr_fmt(fmt) kvm_cma: fmt - -#ifdef CONFIG_CMA_DEBUG -#ifndef DEBUG -# define DEBUG -#endif -#endif - -#include linux/memblock.h -#include linux/mutex.h -#include linux/sizes.h -#include linux/slab.h - -#include book3s_hv_cma.h - -struct kvm_cma { - unsigned long base_pfn; - unsigned long count; - unsigned long *bitmap; -}; - -static DEFINE_MUTEX(kvm_cma_mutex); -static struct kvm_cma kvm_cma_area; - -/** - * kvm_cma_declare_contiguous() - reserve area for contiguous memory handling - * for kvm hash pagetable - * @size: Size of the reserved memory. - * @alignment: Alignment for the contiguous memory area - * - * This function reserves memory for kvm cma area. It should be - * called by arch code when early allocator (memblock or bootmem) - * is still activate. - */ -long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t alignment) -{ - long base_pfn; - phys_addr_t addr; - struct kvm_cma *cma = kvm_cma_area; - - pr_debug(%s(size %lx)\n, __func__, (unsigned long)size); - - if (!size) - return -EINVAL; - /* - * Sanitise input arguments. - * We should be pageblock aligned for CMA
Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality
Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. In previous patches, we have prepared some features to generalize CMA reserved area management and now it's time to do it. This patch moves core functions to mm/cma.c and change DMA APIs to use these functions. There is no functional change in DMA APIs. v2: There is no big change from v1 in mm/cma.c. Mostly renaming. Acked-by: Michal Nazarewicz min...@mina86.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 00e13ce..4eac559 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -283,16 +283,6 @@ config CMA_ALIGNMENT If unsure, leave the default value 8. -config CMA_AREAS - int Maximum count of the CMA device-private areas - default 7 - help - CMA allows to create CMA areas for particular devices. This parameter - sets the maximum number of such device private CMA areas in the - system. - - If unsure, leave the default value 7. - endif endmenu diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 9bc9340..f177f73 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -24,25 +24,10 @@ #include linux/memblock.h #include linux/err.h -#include linux/mm.h -#include linux/mutex.h -#include linux/page-isolation.h #include linux/sizes.h -#include linux/slab.h -#include linux/swap.h -#include linux/mm_types.h #include linux/dma-contiguous.h #include linux/log2.h - -struct cma { - unsigned long base_pfn; - unsigned long count; - unsigned long *bitmap; - int order_per_bit; /* Order of pages represented by one bit */ - struct mutexlock; -}; - -struct cma *dma_contiguous_default_area; +#include linux/cma.h #ifdef CONFIG_CMA_SIZE_MBYTES #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area; #define CMA_SIZE_MBYTES 0 #endif +struct cma *dma_contiguous_default_area; + /* * Default global CMA area size can be defined in kernel's .config. * This is useful mainly for distro maintainers to create a kernel @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit) } } -static DEFINE_MUTEX(cma_mutex); - -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order) -{ - return (1 (align_order cma-order_per_bit)) - 1; -} - -static unsigned long cma_bitmap_maxno(struct cma *cma) -{ - return cma-count cma-order_per_bit; -} - -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma, - unsigned long pages) -{ - return ALIGN(pages, 1 cma-order_per_bit) cma-order_per_bit; -} - -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) -{ - unsigned long bitmapno, nr_bits; - - bitmapno = (pfn - cma-base_pfn) cma-order_per_bit; - nr_bits = cma_bitmap_pages_to_bits(cma, count); - - mutex_lock(cma-lock); - bitmap_clear(cma-bitmap, bitmapno, nr_bits); - mutex_unlock(cma-lock); -} - -static int __init cma_activate_area(struct cma *cma) -{ - int bitmap_maxno = cma_bitmap_maxno(cma); - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long); - unsigned long base_pfn = cma-base_pfn, pfn = base_pfn; - unsigned i = cma-count pageblock_order; - struct zone *zone; - - pr_debug(%s()\n, __func__); - - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL); - if (!cma-bitmap) - return -ENOMEM; - - WARN_ON_ONCE(!pfn_valid(pfn)); - zone = page_zone(pfn_to_page(pfn)); - - do { - unsigned j; - base_pfn = pfn; - for (j = pageblock_nr_pages; j; --j, pfn++) { - WARN_ON_ONCE(!pfn_valid(pfn)); - /* - * alloc_contig_range requires the pfn range - * specified to be in the same zone. Make this - * simple by forcing
Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality
Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. In previous patches, we have prepared some features to generalize CMA reserved area management and now it's time to do it. This patch moves core functions to mm/cma.c and change DMA APIs to use these functions. There is no functional change in DMA APIs. v2: There is no big change from v1 in mm/cma.c. Mostly renaming. Acked-by: Michal Nazarewicz min...@mina86.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 00e13ce..4eac559 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -283,16 +283,6 @@ config CMA_ALIGNMENT If unsure, leave the default value 8. -config CMA_AREAS - int Maximum count of the CMA device-private areas - default 7 - help - CMA allows to create CMA areas for particular devices. This parameter - sets the maximum number of such device private CMA areas in the - system. - - If unsure, leave the default value 7. - endif endmenu diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 9bc9340..f177f73 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -24,25 +24,10 @@ #include linux/memblock.h #include linux/err.h -#include linux/mm.h -#include linux/mutex.h -#include linux/page-isolation.h #include linux/sizes.h -#include linux/slab.h -#include linux/swap.h -#include linux/mm_types.h #include linux/dma-contiguous.h #include linux/log2.h - -struct cma { - unsigned long base_pfn; - unsigned long count; - unsigned long *bitmap; - int order_per_bit; /* Order of pages represented by one bit */ - struct mutexlock; -}; - -struct cma *dma_contiguous_default_area; +#include linux/cma.h #ifdef CONFIG_CMA_SIZE_MBYTES #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area; #define CMA_SIZE_MBYTES 0 #endif +struct cma *dma_contiguous_default_area; + /* * Default global CMA area size can be defined in kernel's .config. * This is useful mainly for distro maintainers to create a kernel @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit) } } -static DEFINE_MUTEX(cma_mutex); - -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order) -{ - return (1 (align_order cma-order_per_bit)) - 1; -} - -static unsigned long cma_bitmap_maxno(struct cma *cma) -{ - return cma-count cma-order_per_bit; -} - -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma, - unsigned long pages) -{ - return ALIGN(pages, 1 cma-order_per_bit) cma-order_per_bit; -} - -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) -{ - unsigned long bitmapno, nr_bits; - - bitmapno = (pfn - cma-base_pfn) cma-order_per_bit; - nr_bits = cma_bitmap_pages_to_bits(cma, count); - - mutex_lock(cma-lock); - bitmap_clear(cma-bitmap, bitmapno, nr_bits); - mutex_unlock(cma-lock); -} - -static int __init cma_activate_area(struct cma *cma) -{ - int bitmap_maxno = cma_bitmap_maxno(cma); - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long); - unsigned long base_pfn = cma-base_pfn, pfn = base_pfn; - unsigned i = cma-count pageblock_order; - struct zone *zone; - - pr_debug(%s()\n, __func__); - - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL); - if (!cma-bitmap) - return -ENOMEM; - - WARN_ON_ONCE(!pfn_valid(pfn)); - zone = page_zone(pfn_to_page(pfn)); - - do { - unsigned j; - base_pfn = pfn; - for (j = pageblock_nr_pages; j; --j, pfn++) { - WARN_ON_ONCE(!pfn_valid(pfn)); - /* - * alloc_contig_range requires the pfn range - * specified to be in the same zone. Make this - * simple by forcing
Re: [PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity
Joonsoo Kim iamjoonsoo@lge.com writes: ppc kvm's cma region management requires arbitrary bitmap granularity, since they want to reserve very large memory and manage this region with bitmap that one bit for several pages to reduce management overheads. So support arbitrary bitmap granularity for following generalization. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index bc4c171..9bc9340 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -38,6 +38,7 @@ struct cma { unsigned long base_pfn; unsigned long count; unsigned long *bitmap; + int order_per_bit; /* Order of pages represented by one bit */ struct mutexlock; }; @@ -157,9 +158,38 @@ void __init dma_contiguous_reserve(phys_addr_t limit) static DEFINE_MUTEX(cma_mutex); +static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order) +{ + return (1 (align_order cma-order_per_bit)) - 1; +} + +static unsigned long cma_bitmap_maxno(struct cma *cma) +{ + return cma-count cma-order_per_bit; +} + +static unsigned long cma_bitmap_pages_to_bits(struct cma *cma, + unsigned long pages) +{ + return ALIGN(pages, 1 cma-order_per_bit) cma-order_per_bit; +} + +static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) +{ + unsigned long bitmapno, nr_bits; + + bitmapno = (pfn - cma-base_pfn) cma-order_per_bit; + nr_bits = cma_bitmap_pages_to_bits(cma, count); + + mutex_lock(cma-lock); + bitmap_clear(cma-bitmap, bitmapno, nr_bits); + mutex_unlock(cma-lock); +} + static int __init cma_activate_area(struct cma *cma) { - int bitmap_size = BITS_TO_LONGS(cma-count) * sizeof(long); + int bitmap_maxno = cma_bitmap_maxno(cma); + int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long); unsigned long base_pfn = cma-base_pfn, pfn = base_pfn; unsigned i = cma-count pageblock_order; struct zone *zone; @@ -221,6 +251,7 @@ core_initcall(cma_init_reserved_areas); * @base: Base address of the reserved area optional, use 0 for any * @limit: End address of the reserved memory (optional, 0 for any). * @alignment: Alignment for the contiguous memory area, should be power of 2 + * @order_per_bit: Order of pages represented by one bit on bitmap. * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area * @@ -235,7 +266,7 @@ core_initcall(cma_init_reserved_areas); */ static int __init __dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, phys_addr_t limit, - phys_addr_t alignment, + phys_addr_t alignment, int order_per_bit, struct cma **res_cma, bool fixed) { struct cma *cma = cma_areas[cma_area_count]; @@ -269,6 +300,8 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, base = ALIGN(base, alignment); size = ALIGN(size, alignment); limit = ~(alignment - 1); + /* size should be aligned with order_per_bit */ + BUG_ON(!IS_ALIGNED(size PAGE_SHIFT, 1 order_per_bit)); /* Reserve memory */ if (base fixed) { @@ -294,6 +327,7 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, */ cma-base_pfn = PFN_DOWN(base); cma-count = size PAGE_SHIFT; + cma-order_per_bit = order_per_bit; *res_cma = cma; cma_area_count++; @@ -313,7 +347,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = __dma_contiguous_reserve_area(size, base, limit, 0, + ret = __dma_contiguous_reserve_area(size, base, limit, 0, 0, res_cma, fixed); if (ret) return ret; @@ -324,13 +358,6 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, return 0; } -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) -{ - mutex_lock(cma-lock); - bitmap_clear(cma-bitmap, pfn - cma-base_pfn, count); - mutex_unlock(cma-lock); -} - /** * dma_alloc_from_contiguous() - allocate pages from contiguous area * @dev: Pointer to device for which the allocation is performed. @@ -345,7 +372,8 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count, unsigned int align) { - unsigned long mask, pfn, pageno, start = 0; + unsigned long mask, pfn, start = 0; + unsigned long bitmap_maxno, bitmapno, nr_bits; struct page *page
Re: [PATCH v2 06/10] CMA: generalize CMA reserved area management functionality
Joonsoo Kim iamjoonsoo@lge.com writes: Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying pasting this reserved area management code. In previous patches, we have prepared some features to generalize CMA reserved area management and now it's time to do it. This patch moves core functions to mm/cma.c and change DMA APIs to use these functions. There is no functional change in DMA APIs. v2: There is no big change from v1 in mm/cma.c. Mostly renaming. Acked-by: Michal Nazarewicz min...@mina86.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com . + + mask = cma_bitmap_aligned_mask(cma, align); + bitmap_maxno = cma_bitmap_maxno(cma); + nr_bits = cma_bitmap_pages_to_bits(cma, count); + + for (;;) { + mutex_lock(cma-lock); + bitmapno = bitmap_find_next_zero_area(cma-bitmap, + bitmap_maxno, start, nr_bits, mask); + if (bitmapno = bitmap_maxno) { + mutex_unlock(cma-lock); + break; + } + bitmap_set(cma-bitmap, bitmapno, nr_bits); + /* + * It's safe to drop the lock here. We've marked this region for + * our exclusive use. If the migration fails we will take the + * lock again and unmark it. + */ + mutex_unlock(cma-lock); + + pfn = cma-base_pfn + (bitmapno cma-order_per_bit); + mutex_lock(cma_mutex); + ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA); + mutex_unlock(cma_mutex); + if (ret == 0) { + page = pfn_to_page(pfn); + break; + } else if (ret != -EBUSY) { + clear_cma_bitmap(cma, pfn, count); + break; + } + For setting bit map we do bitmap_set(cma-bitmap, bitmapno, nr_bits); alloc_contig().. if (error) clear_cma_bitmap(cma, pfn, count); Why ? why not bitmap_clear() ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 34422be566ce..3d0f3fb9c6b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, +bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l LP_SHIFT) ((1 LP_BITS) - 1); @@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul mmu_psize_defs[size].shift; return 1ul mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f53cf2eae36a..7ff45ed27c65 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 87624ab5ba82..c6aca75b8376 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, * to check against the actual page size. */ if ((v valid) (v mask) == val - hpte_page_size(v, r) == (1ul pshift)) + hpte_base_page_size(v, r) == (1ul pshift)) /* Return with the HPTE still locked */ return (hash 3) + (i 1); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
Alexander Graf ag...@suse.de writes: On 13.06.14 16:28, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. So why do we need to override to base page size for the VRMA region? slb encoding should be derived based on base page size. Also I think you want to change the comment above the line in find_lock_hpte you're changing. Will do that. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 34422be566ce..3d0f3fb9c6b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, +bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l LP_SHIFT) ((1 LP_BITS) - 1); @@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul mmu_psize_defs[size].shift; return 1ul mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f53cf2eae36a..7ff45ed27c65 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 87624ab5ba82..c6aca75b8376 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, * to check against the actual page size. */ if ((v valid) (v mask) == val - hpte_page_size(v, r) == (1ul pshift)) + hpte_base_page_size(v, r) == (1ul pshift)) /* Return with the HPTE still locked */ return (hash 3) + (i 1); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
Alexander Graf ag...@suse.de writes: On 13.06.14 16:28, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. So why do we need to override to base page size for the VRMA region? slb encoding should be derived based on base page size. Also I think you want to change the comment above the line in find_lock_hpte you're changing. Will do that. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/10] DMA, CMA: clean-up log message
Joonsoo Kim iamjoonsoo@lge.com writes: We don't need explicit 'CMA:' prefix, since we already define prefix 'cma:' in pr_fmt. So remove it. And, some logs print function name and others doesn't. This looks bad to me, so I unify log format to print function name consistently. Lastly, I add one more debug log on cma_activate_area(). Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 83969f8..bd0bb81 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit) } if (selected_size !dma_contiguous_default_area) { - pr_debug(%s: reserving %ld MiB for global area\n, __func__, + pr_debug(%s(): reserving %ld MiB for global area\n, __func__, (unsigned long)selected_size / SZ_1M); Do we need to do function(), or just function:. I have seen the later usage in other parts of the kernel. dma_contiguous_reserve_area(selected_size, selected_base, @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma) unsigned i = cma-count pageblock_order; struct zone *zone; - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL); + pr_debug(%s()\n, __func__); why ? + cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL); if (!cma-bitmap) return -ENOMEM; @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, /* Sanity checks */ if (cma_area_count == ARRAY_SIZE(cma_areas)) { - pr_err(Not enough slots for CMA reserved regions!\n); + pr_err(%s(): Not enough slots for CMA reserved regions!\n, + __func__); return -ENOSPC; } @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, *res_cma = cma; cma_area_count++; - pr_info(CMA: reserved %ld MiB at %08lx\n, (unsigned long)size / SZ_1M, - (unsigned long)base); + pr_info(%s(): reserved %ld MiB at %08lx\n, + __func__, (unsigned long)size / SZ_1M, (unsigned long)base); /* Architecture specific contiguous memory fixup. */ dma_contiguous_early_fixup(base, size); return 0; err: - pr_err(CMA: failed to reserve %ld MiB\n, (unsigned long)size / SZ_1M); + pr_err(%s(): failed to reserve %ld MiB\n, + __func__, (unsigned long)size / SZ_1M); return ret; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak
Joonsoo Kim iamjoonsoo@lge.com writes: We should free memory for bitmap when we find zone mis-match, otherwise this memory will leak. Additionally, I copy code comment from ppc kvm's cma code to notify why we need to check zone mis-match. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index bd0bb81..fb0cdce 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma) base_pfn = pfn; for (j = pageblock_nr_pages; j; --j, pfn++) { WARN_ON_ONCE(!pfn_valid(pfn)); + /* + * alloc_contig_range requires the pfn range + * specified to be in the same zone. Make this + * simple by forcing the entire CMA resv range + * to be in the same zone. + */ if (page_zone(pfn_to_page(pfn)) != zone) - return -EINVAL; + goto err; } init_cma_reserved_pageblock(pfn_to_page(base_pfn)); } while (--i); mutex_init(cma-lock); return 0; + +err: + kfree(cma-bitmap); + return -EINVAL; } static struct cma cma_areas[MAX_CMA_AREAS]; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs
Joonsoo Kim iamjoonsoo@lge.com writes: To prepare future generalization work on cma area management code, we need to separate core cma management codes from DMA APIs. We will extend these core functions to cover requirements of ppc kvm's cma area management functionality in following patches. This separation helps us not to touch DMA APIs while extending core functions. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index fb0cdce..8a44c82 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas); * If @fixed is true, reserve contiguous area at exactly @base. If false, * reserve in range from @base to @limit. */ -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, -phys_addr_t limit, struct cma **res_cma, -bool fixed) +static int __init __dma_contiguous_reserve_area(phys_addr_t size, + phys_addr_t base, phys_addr_t limit, + struct cma **res_cma, bool fixed) { struct cma *cma = cma_areas[cma_area_count]; phys_addr_t alignment; @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, pr_info(%s(): reserved %ld MiB at %08lx\n, __func__, (unsigned long)size / SZ_1M, (unsigned long)base); - - /* Architecture specific contiguous memory fixup. */ - dma_contiguous_early_fixup(base, size); return 0; + err: pr_err(%s(): failed to reserve %ld MiB\n, __func__, (unsigned long)size / SZ_1M); return ret; } +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, +phys_addr_t limit, struct cma **res_cma, +bool fixed) +{ + int ret; + + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); + if (ret) + return ret; + + /* Architecture specific contiguous memory fixup. */ + dma_contiguous_early_fixup(base, size); + + return 0; +} + static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) { mutex_lock(cma-lock); @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) * global one. Requires architecture specific dev_get_cma_area() helper * function. */ -struct page *dma_alloc_from_contiguous(struct device *dev, int count, +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count, unsigned int align) { unsigned long mask, pfn, pageno, start = 0; - struct cma *cma = dev_get_cma_area(dev); struct page *page = NULL; int ret; if (!cma || !cma-count) return NULL; - if (align CONFIG_CMA_ALIGNMENT) - align = CONFIG_CMA_ALIGNMENT; - pr_debug(%s(cma %p, count %d, align %d)\n, __func__, (void *)cma, count, align); @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count, return page; } +struct page *dma_alloc_from_contiguous(struct device *dev, int count, +unsigned int align) +{ + struct cma *cma = dev_get_cma_area(dev); + + if (align CONFIG_CMA_ALIGNMENT) + align = CONFIG_CMA_ALIGNMENT; + + return __dma_alloc_from_contiguous(cma, count, align); +} + /** * dma_release_from_contiguous() - release allocated pages * @dev: Pointer to device for which the pages were allocated. @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count, * It returns false when provided pages do not belong to contiguous area and * true otherwise. */ -bool dma_release_from_contiguous(struct device *dev, struct page *pages, +static bool __dma_release_from_contiguous(struct cma *cma, struct page *pages, int count) { - struct cma *cma = dev_get_cma_area(dev); unsigned long pfn; if (!cma || !pages) @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages, return true; } + +bool dma_release_from_contiguous(struct device *dev, struct page *pages, + int count) +{ + struct cma *cma = dev_get_cma_area(dev); + + return __dma_release_from_contiguous(cma, pages, count); +} -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region
Joonsoo Kim iamjoonsoo@lge.com writes: ppc kvm's cma area management needs alignment constraint on cma region. So support it to prepare generalization of cma area management functionality. Additionally, add some comments which tell us why alignment constraint is needed on cma region. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 8a44c82..bc4c171 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -32,6 +32,7 @@ #include linux/swap.h #include linux/mm_types.h #include linux/dma-contiguous.h +#include linux/log2.h struct cma { unsigned long base_pfn; @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas); * @size: Size of the reserved area (in bytes), * @base: Base address of the reserved area optional, use 0 for any * @limit: End address of the reserved memory (optional, 0 for any). + * @alignment: Alignment for the contiguous memory area, should be power of 2 * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area * @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas); */ static int __init __dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, phys_addr_t limit, + phys_addr_t alignment, struct cma **res_cma, bool fixed) { struct cma *cma = cma_areas[cma_area_count]; - phys_addr_t alignment; int ret = 0; - pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__, - (unsigned long)size, (unsigned long)base, - (unsigned long)limit); + pr_debug(%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n, + __func__, (unsigned long)size, (unsigned long)base, + (unsigned long)limit, (unsigned long)alignment); /* Sanity checks */ if (cma_area_count == ARRAY_SIZE(cma_areas)) { @@ -253,8 +255,17 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, if (!size) return -EINVAL; - /* Sanitise input arguments */ - alignment = PAGE_SIZE max(MAX_ORDER - 1, pageblock_order); + if (alignment !is_power_of_2(alignment)) + return -EINVAL; + + /* + * Sanitise input arguments. + * CMA area should be at least MAX_ORDER - 1 aligned. Otherwise, + * CMA area could be merged into other MIGRATE_TYPE by buddy mechanism + * and CMA property will be broken. + */ + alignment = max(alignment, + (phys_addr_t)PAGE_SIZE max(MAX_ORDER - 1, pageblock_order)); base = ALIGN(base, alignment); size = ALIGN(size, alignment); limit = ~(alignment - 1); @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); + ret = __dma_contiguous_reserve_area(size, base, limit, 0, + res_cma, fixed); if (ret) return ret; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/10] DMA, CMA: clean-up log message
Joonsoo Kim iamjoonsoo@lge.com writes: We don't need explicit 'CMA:' prefix, since we already define prefix 'cma:' in pr_fmt. So remove it. And, some logs print function name and others doesn't. This looks bad to me, so I unify log format to print function name consistently. Lastly, I add one more debug log on cma_activate_area(). Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 83969f8..bd0bb81 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit) } if (selected_size !dma_contiguous_default_area) { - pr_debug(%s: reserving %ld MiB for global area\n, __func__, + pr_debug(%s(): reserving %ld MiB for global area\n, __func__, (unsigned long)selected_size / SZ_1M); Do we need to do function(), or just function:. I have seen the later usage in other parts of the kernel. dma_contiguous_reserve_area(selected_size, selected_base, @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma) unsigned i = cma-count pageblock_order; struct zone *zone; - cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL); + pr_debug(%s()\n, __func__); why ? + cma-bitmap = kzalloc(bitmap_size, GFP_KERNEL); if (!cma-bitmap) return -ENOMEM; @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, /* Sanity checks */ if (cma_area_count == ARRAY_SIZE(cma_areas)) { - pr_err(Not enough slots for CMA reserved regions!\n); + pr_err(%s(): Not enough slots for CMA reserved regions!\n, + __func__); return -ENOSPC; } @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, *res_cma = cma; cma_area_count++; - pr_info(CMA: reserved %ld MiB at %08lx\n, (unsigned long)size / SZ_1M, - (unsigned long)base); + pr_info(%s(): reserved %ld MiB at %08lx\n, + __func__, (unsigned long)size / SZ_1M, (unsigned long)base); /* Architecture specific contiguous memory fixup. */ dma_contiguous_early_fixup(base, size); return 0; err: - pr_err(CMA: failed to reserve %ld MiB\n, (unsigned long)size / SZ_1M); + pr_err(%s(): failed to reserve %ld MiB\n, + __func__, (unsigned long)size / SZ_1M); return ret; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak
Joonsoo Kim iamjoonsoo@lge.com writes: We should free memory for bitmap when we find zone mis-match, otherwise this memory will leak. Additionally, I copy code comment from ppc kvm's cma code to notify why we need to check zone mis-match. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index bd0bb81..fb0cdce 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma) base_pfn = pfn; for (j = pageblock_nr_pages; j; --j, pfn++) { WARN_ON_ONCE(!pfn_valid(pfn)); + /* + * alloc_contig_range requires the pfn range + * specified to be in the same zone. Make this + * simple by forcing the entire CMA resv range + * to be in the same zone. + */ if (page_zone(pfn_to_page(pfn)) != zone) - return -EINVAL; + goto err; } init_cma_reserved_pageblock(pfn_to_page(base_pfn)); } while (--i); mutex_init(cma-lock); return 0; + +err: + kfree(cma-bitmap); + return -EINVAL; } static struct cma cma_areas[MAX_CMA_AREAS]; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs
Joonsoo Kim iamjoonsoo@lge.com writes: To prepare future generalization work on cma area management code, we need to separate core cma management codes from DMA APIs. We will extend these core functions to cover requirements of ppc kvm's cma area management functionality in following patches. This separation helps us not to touch DMA APIs while extending core functions. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index fb0cdce..8a44c82 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas); * If @fixed is true, reserve contiguous area at exactly @base. If false, * reserve in range from @base to @limit. */ -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, -phys_addr_t limit, struct cma **res_cma, -bool fixed) +static int __init __dma_contiguous_reserve_area(phys_addr_t size, + phys_addr_t base, phys_addr_t limit, + struct cma **res_cma, bool fixed) { struct cma *cma = cma_areas[cma_area_count]; phys_addr_t alignment; @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, pr_info(%s(): reserved %ld MiB at %08lx\n, __func__, (unsigned long)size / SZ_1M, (unsigned long)base); - - /* Architecture specific contiguous memory fixup. */ - dma_contiguous_early_fixup(base, size); return 0; + err: pr_err(%s(): failed to reserve %ld MiB\n, __func__, (unsigned long)size / SZ_1M); return ret; } +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, +phys_addr_t limit, struct cma **res_cma, +bool fixed) +{ + int ret; + + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); + if (ret) + return ret; + + /* Architecture specific contiguous memory fixup. */ + dma_contiguous_early_fixup(base, size); + + return 0; +} + static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) { mutex_lock(cma-lock); @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) * global one. Requires architecture specific dev_get_cma_area() helper * function. */ -struct page *dma_alloc_from_contiguous(struct device *dev, int count, +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count, unsigned int align) { unsigned long mask, pfn, pageno, start = 0; - struct cma *cma = dev_get_cma_area(dev); struct page *page = NULL; int ret; if (!cma || !cma-count) return NULL; - if (align CONFIG_CMA_ALIGNMENT) - align = CONFIG_CMA_ALIGNMENT; - pr_debug(%s(cma %p, count %d, align %d)\n, __func__, (void *)cma, count, align); @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count, return page; } +struct page *dma_alloc_from_contiguous(struct device *dev, int count, +unsigned int align) +{ + struct cma *cma = dev_get_cma_area(dev); + + if (align CONFIG_CMA_ALIGNMENT) + align = CONFIG_CMA_ALIGNMENT; + + return __dma_alloc_from_contiguous(cma, count, align); +} + /** * dma_release_from_contiguous() - release allocated pages * @dev: Pointer to device for which the pages were allocated. @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count, * It returns false when provided pages do not belong to contiguous area and * true otherwise. */ -bool dma_release_from_contiguous(struct device *dev, struct page *pages, +static bool __dma_release_from_contiguous(struct cma *cma, struct page *pages, int count) { - struct cma *cma = dev_get_cma_area(dev); unsigned long pfn; if (!cma || !pages) @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages, return true; } + +bool dma_release_from_contiguous(struct device *dev, struct page *pages, + int count) +{ + struct cma *cma = dev_get_cma_area(dev); + + return __dma_release_from_contiguous(cma, pages, count); +} -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region
Joonsoo Kim iamjoonsoo@lge.com writes: ppc kvm's cma area management needs alignment constraint on cma region. So support it to prepare generalization of cma area management functionality. Additionally, add some comments which tell us why alignment constraint is needed on cma region. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 8a44c82..bc4c171 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -32,6 +32,7 @@ #include linux/swap.h #include linux/mm_types.h #include linux/dma-contiguous.h +#include linux/log2.h struct cma { unsigned long base_pfn; @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas); * @size: Size of the reserved area (in bytes), * @base: Base address of the reserved area optional, use 0 for any * @limit: End address of the reserved memory (optional, 0 for any). + * @alignment: Alignment for the contiguous memory area, should be power of 2 * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area * @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas); */ static int __init __dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, phys_addr_t limit, + phys_addr_t alignment, struct cma **res_cma, bool fixed) { struct cma *cma = cma_areas[cma_area_count]; - phys_addr_t alignment; int ret = 0; - pr_debug(%s(size %lx, base %08lx, limit %08lx)\n, __func__, - (unsigned long)size, (unsigned long)base, - (unsigned long)limit); + pr_debug(%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n, + __func__, (unsigned long)size, (unsigned long)base, + (unsigned long)limit, (unsigned long)alignment); /* Sanity checks */ if (cma_area_count == ARRAY_SIZE(cma_areas)) { @@ -253,8 +255,17 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, if (!size) return -EINVAL; - /* Sanitise input arguments */ - alignment = PAGE_SIZE max(MAX_ORDER - 1, pageblock_order); + if (alignment !is_power_of_2(alignment)) + return -EINVAL; + + /* + * Sanitise input arguments. + * CMA area should be at least MAX_ORDER - 1 aligned. Otherwise, + * CMA area could be merged into other MIGRATE_TYPE by buddy mechanism + * and CMA property will be broken. + */ + alignment = max(alignment, + (phys_addr_t)PAGE_SIZE max(MAX_ORDER - 1, pageblock_order)); base = ALIGN(base, alignment); size = ALIGN(size, alignment); limit = ~(alignment - 1); @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); + ret = __dma_contiguous_reserve_area(size, base, limit, 0, + res_cma, fixed); if (ret) return ret; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: PPC: BOOK3S: PR: Doorbell support
Alexander Graf ag...@suse.de writes: On 05.06.14 14:08, Aneesh Kumar K.V wrote: We don't have SMT support yet, hence we should not find a doorbell message generated Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_emulate.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 1bb16a59dcbc..d6c87d085182 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -28,7 +28,9 @@ #define OP_19_XOP_RFI 50 #define OP_31_XOP_MFMSR83 +#define OP_31_XOP_MSGSNDP 142 #define OP_31_XOP_MTMSR146 +#define OP_31_XOP_MSGCLRP 174 #define OP_31_XOP_MTMSRD 178 #define OP_31_XOP_MTSR 210 #define OP_31_XOP_MTSRIN 242 @@ -303,6 +305,22 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, break; } +case OP_31_XOP_MSGSNDP: +{ +/* + * PR KVM still don't support SMT mode. So we should still? + * not see a MSGSNDP/MSGCLRP used with PR KVM + */ +pr_info(KVM: MSGSNDP used in non SMT case\n); +emulated = EMULATE_FAIL; What would happen on an HV guest with only 1 thread that MSGSNDs to thread 0? Would the guest get an illegal instruction trap, a self-interrupt or would this be a simple nop? We do get a self-interrupt. I tried the below tag = mfspr(SPRN_TIR) 0x7f; ppc_msgsnd(5, 0, tag); And that results in doorbell exception. That implies we will have to have full implementation of doorbell. You can drop patch 2 and 3 from this series. I will rework them. NOTE: This is not an issue for Linux guest, because we don't send ipi to self. But to complete the emulation of msgsndp we will need to emulate it properly. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: PPC: BOOK3S: PR: Doorbell support
Alexander Graf ag...@suse.de writes: On 05.06.14 14:08, Aneesh Kumar K.V wrote: We don't have SMT support yet, hence we should not find a doorbell message generated Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_emulate.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 1bb16a59dcbc..d6c87d085182 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -28,7 +28,9 @@ #define OP_19_XOP_RFI 50 #define OP_31_XOP_MFMSR83 +#define OP_31_XOP_MSGSNDP 142 #define OP_31_XOP_MTMSR146 +#define OP_31_XOP_MSGCLRP 174 #define OP_31_XOP_MTMSRD 178 #define OP_31_XOP_MTSR 210 #define OP_31_XOP_MTSRIN 242 @@ -303,6 +305,22 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, break; } +case OP_31_XOP_MSGSNDP: +{ +/* + * PR KVM still don't support SMT mode. So we should still? + * not see a MSGSNDP/MSGCLRP used with PR KVM + */ +pr_info(KVM: MSGSNDP used in non SMT case\n); +emulated = EMULATE_FAIL; What would happen on an HV guest with only 1 thread that MSGSNDs to thread 0? Would the guest get an illegal instruction trap, a self-interrupt or would this be a simple nop? We do get a self-interrupt. I tried the below tag = mfspr(SPRN_TIR) 0x7f; ppc_msgsnd(5, 0, tag); And that results in doorbell exception. That implies we will have to have full implementation of doorbell. You can drop patch 2 and 3 from this series. I will rework them. NOTE: This is not an issue for Linux guest, because we don't send ipi to self. But to complete the emulation of msgsndp we will need to emulate it properly. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html