On Sun, Oct 26, 2008 at 01:17:14PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Instead of flushing remote TLB's at every page resync, do an initial
>> pass to write protect the sptes, collapsing the flushes on a single
>> remote TLB invalidation.
>>
>> kernbench is 2.3% faster on 4-way guest. Improvements have been seen
>> with other loads such as AIM7.
>>
>> Avi: feel free to change this if you dislike the style (I do, but can't
>> think of anything nicer).
>>
>> static void mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page
>> *sp)
>> {
>> struct sync_walker walker = {
>> - .walker = { .entry = mmu_sync_fn, },
>> + .walker = { .entry = mmu_wprotect_fn,
>> + .clear_unsync = false, },
>> .vcpu = vcpu,
>> + .write_protected = false
>> };
>> + /* collapse the TLB flushes as an optimization */
>> + mmu_unsync_walk(sp, &walker.walker);
>> + if (walker.write_protected)
>> + kvm_flush_remote_tlbs(vcpu->kvm);
>> +
>> + walker.walker.entry = mmu_sync_fn;
>> + walker.walker.clear_unsync = true;
>> +
>> while (mmu_unsync_walk(sp, &walker.walker))
>> cond_resched_lock(&vcpu->kvm->mmu_lock);
>>
>
> We're always doing two passes here, which is a bit sad. How about
> having a single pass which:
>
> - collects unsync pages into an array
> - exits on no more unsync pages or max array size reached
>
> Then, iterate over the array:
>
> - write protect all pages
> - flush tlb
> - sync pages
>
> Loop until the root is synced.
>
> If the number of pages to sync is typically small, and the array is
> sized to be larger than this, then we only walk the pagetables once.
There is significant overhead now in comparison to the early indexing
scheme with a list per root. It must be optimized.
A problem with your suggestion is how to clean the unsync bitmap bit in
the upper pagetables. The advantage however is that the bitmaps and spte
entries can be cached in L1, while currently the cache is blown at
every page resync.
What i'm testing now is:
#define KVM_PAGE_ARRAY_NR 16
struct kvm_mmu_pages {
struct kvm_mmu_page *pages[KVM_PAGE_ARRAY_NR];
struct kvm_mmu_page *parent_pages[KVM_PAGE_ARRAY_NR];
unsigned int offset[KVM_PAGE_ARRAY_NR];
unsigned int nr;
};
static void mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
{
int i;
struct kvm_mmu_pages pages;
kvm_mmu_pages_init(&pages);
while (mmu_unsync_walk(sp, &pages)) {
for_each_sp(pages, i) {
struct kvm_mmu_page *parent = pages.parent_pages[i];
kvm_sync_page(vcpu, pages.pages[i]);
__clear_bit(pages.offset[i],
parent->unsync_child_bitmap);
}
kvm_mmu_pages_init(&pages);
cond_resched_lock(&vcpu->kvm->mmu_lock);
}
}
But a second pass is still needed for levels 3 and 4 (the second pass
could be postponed for the next CR3 switch, but i'm not sure its
worthwhile).
Also, the array method poorly handles cases with a large number of
unsync pages, which are common with 2.4/kscand for example.
Hum, Chris suggests a list_head per level instead of the bitmap.
> btw, our walkers are a bit awkward (though still better than what we had
> before). If we rewrite them into for_each style iterators, the code
> could become cleaner and shorter.
>
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html