Shaohua Li wrote:
> Hi,
> I saw some discussions on the topic but no progress. I did an
> experiment to make guest page be allocated dynamically and swap out.
> please see attachment patches. It's not yet for merge but I'd like get
> some suggestions and help. Patches (against kvm-19) work here but
> maybe not very stable as there should be some lock issue for swapout,
> which I'll do more check later. If you are brave, please try :).
Nice work. This is fairly different from what I had in mind - I wanted
to use regular address spaces in kvm, whereas this patchset adds swapout
capability to the kvm address space.
Differences between the two approaches include:
- yours is probably simpler :)
- possibly less intrusive code mm changes with using regular address spaces
- automatic hugetlbfs support (this was my main motivation for generic
address spaces, esp. with npt/ept). of course hugetlbfs can be
implemented with your approach as well
- your approach allows kvm to continue using page->private, so it saves
memory and requires less kvm modification
- using Linux address spaces allows paging to file-backed storage, not
just swap
Ultimately I think the balance is in favor of your approach, as it is
more tightly coupled with kvm and can therefore be faster. The
simplicity also helps a lot.
> Some
> issues I have:
> 1. there is a spinlock to pretoct kvm struct, we can't sleep in it. A
> possible solution is do a 'release lock, sleep and retry', but the
> shadow page fault path sounds not easy to follow it. The spinlock also
> prevents vcpu is migrated to other cpus as vmx operation must be done
> in the cpu vcpu runs. I changed it to a semaphore plus a cpu affinity
> setting. It's a little hacky, I'd see if there are better approaches.
My plan is to teach the scheduler about kvm, so it can call a callback
when a vcpu is migrated. That will allow re-enabling preemption in all
kvm code except the actual entry/exit sequence. This is an improvement
all over (for realtime, for easier coding, for latency) so I hope to to
it soon.
> 2. Linux page relcaim can't get if a guest page is referenced often.
> My current patch just bliendly adds guest page to lru, not optimized.
Well, that will always be a problem with paging guest memory. There are
some patches floating around to allow a guest to give hints to the host
about page recency, for s390, which may help.
> 3. kvm_ops.tlb_flush should really send an IPI to make the vcpu flush
> tlb, as it might be called in other cpus other than the cpu vcpu run.
> This makes the swapout path not be able to zap shadow page tables. My
> patch just skip any guest page which has shadow page table points to.
> I assume kvm smp guest support will improve the tlb_flush.
>
Yes. The apic patchset includes mechanisms for interrupting a running
vcpu which can be used for this.
> @@ -151,9 +151,8 @@
> walker->inherited_ar &= walker->table[index];
> table_gfn = (*ptep & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
> paddr = safe_gpa_to_hpa(vcpu, *ptep & PT_BASE_ADDR_MASK);
> - kunmap_atomic(walker->table, KM_USER0);
> - walker->table = kmap_atomic(pfn_to_page(paddr >> PAGE_SHIFT),
> - KM_USER0);
> + kunmap(walker->table);
> + walker->table = kmap(pfn_to_page(paddr >> PAGE_SHIFT));
>
kunmap() wants a struct page IIRC. It's also much slower than the
atomic variant on i386+HIGHMEM, so I'd rather avoid it.
> @@ -1099,11 +1121,23 @@
> }
> }
>
> +static void mmu_zap_active_pages(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_mmu_page *page;
> +
> + while (!list_empty(&vcpu->kvm->active_mmu_pages)) {
> + page = container_of(vcpu->kvm->active_mmu_pages.next,
> + struct kvm_mmu_page, link);
> + kvm_mmu_zap_page(vcpu, page);
> + }
> +}
> +
> int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
> {
> int r;
>
> destroy_kvm_mmu(vcpu);
> + mmu_zap_active_pages(vcpu);
> r = init_kvm_mmu(vcpu);
> if (r < 0)
> goto out;
>
This is called on set_cr0(), which can be called fairly often. However,
I think it can be qualified on changing the paging related bits.
> Index: kvm/kernel/paging_tmpl.h
> ===================================================================
> --- kvm.orig/kernel/paging_tmpl.h 2007-05-21 09:20:11.000000000 +0800
> +++ kvm/kernel/paging_tmpl.h 2007-05-21 09:20:26.000000000 +0800
> @@ -369,7 +369,7 @@
> *shadow_ent |= PT_WRITABLE_MASK;
> FNAME(mark_pagetable_dirty)(vcpu->kvm, walker);
> *guest_ent |= PT_DIRTY_MASK;
> - rmap_add(vcpu, shadow_ent);
> +// rmap_add(vcpu, shadow_ent);
>
??
> +
> +static void kvm_invalidatepage(struct page *page, unsigned long offset)
> +{
> + /*
> + * truncate_page is done after vcpu_free, that means all shadow page
> + * table should be freed already, we should never get here
> + */
> + BUG();
> +}
>
Eventually we'll want to add support for invalidating a vm page, to
support ballooning and similar mechanisms.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel