On 06/19/2014 02:12 AM, mtosa...@redhat.com wrote:
Allow vcpus to pin spte translations by:

1) Creating a per-vcpu list of pinned ranges.
2) On mmu reload request:
        - Fault ranges.
        - Mark sptes with a pinned bit.
        - Mark shadow pages as pinned.

3) Then modify the following actions:
        - Page age => skip spte flush.
        - MMU notifiers => force mmu reload request (which kicks cpu out of
                                guest mode).
        - GET_DIRTY_LOG => force mmu reload request.
        - SLAB shrinker => skip shadow page deletion.

TDP-only.

+int kvm_mmu_register_pinned_range(struct kvm_vcpu *vcpu,
+                                 gfn_t base_gfn, unsigned long npages)
+{
+       struct kvm_pinned_page_range *p;
+
+       mutex_lock(&vcpu->arch.pinned_mmu_mutex);
+       list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) {
+               if (p->base_gfn == base_gfn && p->npages == npages) {
+                       mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
+                       return -EEXIST;
+               }
+       }
+       mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
+
+       if (vcpu->arch.nr_pinned_ranges >=
+           KVM_MAX_PER_VCPU_PINNED_RANGE)
+               return -ENOSPC;
+
+       p = kzalloc(sizeof(struct kvm_pinned_page_range), GFP_KERNEL);
+       if (!p)
+               return -ENOMEM;
+
+       vcpu->arch.nr_pinned_ranges++;
+
+       trace_kvm_mmu_register_pinned_range(vcpu->vcpu_id, base_gfn, npages);
+
+       INIT_LIST_HEAD(&p->link);
+       p->base_gfn = base_gfn;
+       p->npages = npages;
+       mutex_lock(&vcpu->arch.pinned_mmu_mutex);
+       list_add(&p->link, &vcpu->arch.pinned_mmu_pages);
+       mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
+       kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+
+       return 0;
+}
+

What happens if ranges overlap (within a vcpu, cross-vcpu)? Or if a range overflows and wraps around 0? Or if it does not refer to RAM?

Looks like you're limiting the number of ranges, but not the number of pages, so a guest can lock all of its memory.

+
+/*
+ * Pin KVM MMU page translations. This guarantees, for valid
+ * addresses registered by kvm_mmu_register_pinned_range (valid address
+ * meaning address which posses sufficient information for fault to
+ * be resolved), valid translations exist while in guest mode and
+ * therefore no VM-exits due to faults will occur.
+ *
+ * Failure to instantiate pages will abort guest entry.
+ *
+ * Page frames should be pinned with get_page in advance.
+ *
+ * Pinning is not guaranteed while executing as L2 guest.

Does this undermine security?

+ *
+ */
+
+static void kvm_mmu_pin_pages(struct kvm_vcpu *vcpu)
+{
+       struct kvm_pinned_page_range *p;
+
+       if (is_guest_mode(vcpu))
+               return;
+
+       if (!vcpu->arch.mmu.direct_map)
+               return;
+
+       ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+       mutex_lock(&vcpu->arch.pinned_mmu_mutex);

Is the mutex actually needed? It seems it's only taken in vcpu context, so the vcpu mutex should be sufficient.

+       list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) {
+               gfn_t gfn_offset;
+
+               for (gfn_offset = 0; gfn_offset < p->npages; gfn_offset++) {
+                       gfn_t gfn = p->base_gfn + gfn_offset;
+                       int r;
+                       bool pinned = false;
+
+                       r = vcpu->arch.mmu.page_fault(vcpu, gfn << PAGE_SHIFT,
+                                                    PFERR_WRITE_MASK, false,
+                                                    true, &pinned);
+                       /* MMU notifier sequence window: retry */
+                       if (!r && !pinned)
+                               kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+                       if (r) {
+                               kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+                               break;
+                       }
+
+               }
+       }
+       mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
+}
+
  int kvm_mmu_load(struct kvm_vcpu *vcpu)
  {
        int r;
@@ -3916,6 +4101,7 @@
                goto out;
        /* set_cr3() should ensure TLB has been flushed */
        vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
+       kvm_mmu_pin_pages(vcpu);
  out:
        return r;
  }


I don't see where you unpin pages, so even if you limit the number of pinned pages, a guest can pin all of memory by iterating over all of memory and pinning it a chunk at a time.

You might try something similar to guest MTRR handling.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to