RISC-V KVM has used the hugetlb VMA size directly as the G-stage
mapping size since stage-2 page table support was added. That is safe
only if the block covered by the fault is fully contained in the
memslot and the userspace address has the same offset as the GPA
within that block.

The THP path already checks those constraints before installing a PMD
block mapping. The hugetlb path did not, so an unaligned memslot could
make KVM install a PMD or PUD sized G-stage block that covers memory
outside the slot or maps the wrong host pages.

Pass the target mapping size into fault_supports_gstage_huge_mapping().
The same helper can be used for both THP PMD mappings and hugetlb
PMD/PUD mappings.

Select hugetlb mapping sizes through the same memslot-boundary check,
falling back from PUD to PMD to PAGE_SIZE. When a smaller hugetlb
mapping size is selected, fault the GFN aligned to that selected size
instead of the original VMA size.

Also keep hugetlb mappings out of transparent_hugepage_adjust(). Once
the hugetlb path has chosen PAGE_SIZE, promoting it again through the
THP helper would miss the hugetlb fallback decision.

Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
Signed-off-by: Jinyu Tang <[email protected]>
Reviewed-by: Nutty Liu <[email protected]>
---
v1 -> v2:
  - Squash the helper parameterization into this hugetlb fix.
  - Use the ALIGN()/ALIGN_DOWN() form suggested by Nutty Liu and Anup
    for the memslot boundary check.

 arch/riscv/kvm/mmu.c | 54 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 2d3def024..0adf017a2 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -286,7 +286,8 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range 
*range)
 }
 
 static bool fault_supports_gstage_huge_mapping(struct kvm_memory_slot *memslot,
-                                              unsigned long hva)
+                                              unsigned long hva,
+                                              unsigned long map_size)
 {
        hva_t uaddr_start, uaddr_end;
        gpa_t gpa_start;
@@ -300,8 +301,8 @@ static bool fault_supports_gstage_huge_mapping(struct 
kvm_memory_slot *memslot,
 
        /*
         * Pages belonging to memslots that don't have the same alignment
-        * within a PMD for userspace and GPA cannot be mapped with g-stage
-        * PMD entries, because we'll end up mapping the wrong pages.
+        * within a huge page for userspace and GPA cannot be mapped with
+        * g-stage block entries, because we'll end up mapping the wrong pages.
         *
         * Consider a layout like the following:
         *
@@ -321,7 +322,7 @@ static bool fault_supports_gstage_huge_mapping(struct 
kvm_memory_slot *memslot,
         *   e -> g
         *   f -> h
         */
-       if ((gpa_start & (PMD_SIZE - 1)) != (uaddr_start & (PMD_SIZE - 1)))
+       if ((gpa_start & (map_size - 1)) != (uaddr_start & (map_size - 1)))
                return false;
 
        /*
@@ -336,7 +337,8 @@ static bool fault_supports_gstage_huge_mapping(struct 
kvm_memory_slot *memslot,
         * userspace_addr or the base_gfn, as both are equally aligned (per
         * the check above) and equally sized.
         */
-       return (hva >= ALIGN(uaddr_start, PMD_SIZE)) && (hva < 
ALIGN_DOWN(uaddr_end, PMD_SIZE));
+       return (hva >= ALIGN(uaddr_start, map_size)) &&
+              (hva < ALIGN_DOWN(uaddr_end, map_size));
 }
 
 static int get_hva_mapping_size(struct kvm *kvm,
@@ -404,7 +406,7 @@ static unsigned long transparent_hugepage_adjust(struct kvm 
*kvm,
         * sure that the HVA and GPA are sufficiently aligned and that the
         * block map is contained within the memslot.
         */
-       if (fault_supports_gstage_huge_mapping(memslot, hva)) {
+       if (fault_supports_gstage_huge_mapping(memslot, hva, PMD_SIZE)) {
                int sz;
 
                sz = get_hva_mapping_size(kvm, hva);
@@ -421,12 +423,33 @@ static unsigned long transparent_hugepage_adjust(struct 
kvm *kvm,
        return PAGE_SIZE;
 }
 
+static unsigned long hugetlb_mapping_size(struct kvm_memory_slot *memslot,
+                                         unsigned long hva,
+                                         unsigned long map_size)
+{
+       switch (map_size) {
+       case PUD_SIZE:
+               if (fault_supports_gstage_huge_mapping(memslot, hva, PUD_SIZE))
+                       return PUD_SIZE;
+               fallthrough;
+       case PMD_SIZE:
+               if (fault_supports_gstage_huge_mapping(memslot, hva, PMD_SIZE))
+                       return PMD_SIZE;
+               fallthrough;
+       case PAGE_SIZE:
+               return PAGE_SIZE;
+       default:
+               return map_size;
+       }
+}
+
 int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
                      gpa_t gpa, unsigned long hva, bool is_write,
                      struct kvm_gstage_mapping *out_map)
 {
        int ret;
        kvm_pfn_t hfn;
+       bool is_hugetlb;
        bool writable;
        short vma_pageshift;
        gfn_t gfn = gpa >> PAGE_SHIFT;
@@ -460,16 +483,23 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
                return -EFAULT;
        }
 
-       if (is_vm_hugetlb_page(vma))
+       is_hugetlb = is_vm_hugetlb_page(vma);
+       if (is_hugetlb)
                vma_pageshift = huge_page_shift(hstate_vma(vma));
        else
                vma_pageshift = PAGE_SHIFT;
        vma_pagesize = 1ULL << vma_pageshift;
        if (logging || (vma->vm_flags & VM_PFNMAP))
                vma_pagesize = PAGE_SIZE;
+       else if (is_hugetlb)
+               vma_pagesize = hugetlb_mapping_size(memslot, hva, vma_pagesize);
 
+       /*
+        * For hugetlb mappings, vma_pagesize might have been reduced from the
+        * VMA size to a smaller safe mapping size.
+        */
        if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
-               gfn = (gpa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
+               gfn = ALIGN_DOWN(gpa, vma_pagesize) >> PAGE_SHIFT;
 
        /*
         * Read mmu_invalidate_seq so that KVM can detect if the results of
@@ -511,8 +541,12 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
        if (mmu_invalidate_retry(kvm, mmu_seq))
                goto out_unlock;
 
-       /* Check if we are backed by a THP and thus use block mapping if 
possible */
-       if (!logging && (vma_pagesize == PAGE_SIZE))
+       /*
+        * Check if we are backed by a THP and thus use block mapping if
+        * possible. Hugetlb mappings already selected their target size above,
+        * so do not promote them through the THP helper.
+        */
+       if (!logging && !is_hugetlb && vma_pagesize == PAGE_SIZE)
                vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva, 
&hfn, &gpa);
 
        if (writable) {
-- 
2.43.0


Reply via email to