Hello, it seems to me that the reclaim mechanism for shadow page table pages is sub- optimal. The arch.active_mmu_pages list that is used for reclaiming does not move up parent shadow page tables when a child is added so when we need a new shadow page we zap the oldest - which can well be a directory level page holding a just added table level page.
Attached is a proof-of-concept diff and two plots before and after. The plots show referenced guest pages over time. As you can see there is less saw- toothing in the after plot and also fewer changes overall (because we don't zap mappings that are still in use as often). This is with a limit of 64 for the shadow page table to increase the effect and vmx/ept. I realize that the list_move and parent walk are quite expensive and that kvm_mmu_alloc_page is only half the story. It should really be done every time a new guest page table is mapped - maybe via rmap_add. This would obviously completely kill performance-wise, though. Another idea would be to improve the reclaim logic in a way that it prefers "old" PT_PAGE_TABLE_LEVEL over directories. Though I'm not sure how to code that up sensibly, either. As I said, this is proof-of-concept and RFC. So any comments welcome. For my use case the proof-of-concept diff seems to do well enough, though. Thanks, -- /"\ Best regards, | [email protected] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mla...@efnet / \ ASCII Ribbon Campaign | Against HTML Mail and News
<<attachment: before.png>>
<<attachment: after.png>>
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 95d5329..0a63570 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -190,6 +190,8 @@ struct kvm_unsync_walk {
};
typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
+static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+ mmu_parent_walk_fn fn);
static struct kmem_cache *pte_chain_cache;
static struct kmem_cache *rmap_desc_cache;
@@ -900,6 +902,12 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn)
return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
}
+static int move_up_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+ list_move(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
+ return 1;
+}
+
static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
u64 *parent_pte)
{
@@ -918,6 +926,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
sp->multimapped = 0;
sp->parent_pte = parent_pte;
+#if 1
+ if (parent_pte)
+ mmu_parent_walk(vcpu, sp, move_up_walk_fn);
+#endif
--vcpu->kvm->arch.n_free_mmu_pages;
return sp;
}
