[ofa-general] Re: [kvm-devel] mmu notifier #v14

Anthony Liguori Sat, 26 Apr 2008 18:54:38 -0700

Andrea Arcangeli wrote:

On Sat, Apr 26, 2008 at 01:59:23PM -0500, Anthony Liguori wrote:
+static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
+{
+ struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >>PAGE_SHIFT);
+       get_page(page);
You should not assume a struct page exists for any given spte. Instead, usekvm_get_pfn() and kvm_release_pfn_clean().
Last email from [EMAIL PROTECTED] in my inbox argues it's useless to build rmap
on mmio regions, so the above is more efficient so put_page runs
directly on the page without going back and forth between spte -> pfn
-> page -> pfn -> page in a single function.

Avi can correct me if I'm wrong, but I don't think the consensus of thatdiscussion was that we're going to avoid putting mmio pages in thermap. Practically speaking, replacing:

+ struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >>PAGE_SHIFT);

+       get_page(page);


With:

unsigned long pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
kvm_get_pfn(pfn);

Results in exactly the same code except the later allows mmio pfns inthe rmap. So ignoring the whole mmio thing, using accessors that arealready there and used elsewhere seems like a good idea :-)

Certainly if we start building rmap on mmio regions we'll have to
change that.
Perhaps I just have a weak stomach but I am uneasy having a function thattakes a lock on exit. I walked through the logic and it doesn't appear tobe wrong but it also is pretty clear that you could defer the acquisitionof the lock to the caller (in this case, kvm_mmu_pte_write) by moving theupdate_pte assignment into kvm_mmu_pte_write.
I agree out_lock is an uncommon exit path, the problem is that the
code was buggy, and I tried to fix it with the smallest possible
change and that resulting in an out_lock. That section likely need a
refactoring, all those update_pte fields should be at least returned
by the function guess_....  but I tried to reduce the changes to make
the issue more readable, I didn't want to rewrite certain functions
just to take a spinlock a few instructions ahead.

I appreciate the desire to minimize changes, but taking a lock on returnseems to take that to a bit of an extreme. It seems like a simple thingto fix though, no?

Why move the destruction of the vm to the MMU notifier unregister hook?Does anything else ever call mmu_notifier_unregister that would implicitlydestroy the VM?


mmu notifier ->release can run at anytime before the filehandle is
closed. ->release has to zap all sptes and freeze the mmu (hence all
vcpus) to prevent any further page fault. After ->release returns all
pages are freed (we'll never relay on the page pin to avoid the
rmap_remove put_page to be a relevant unpin event). So the idea is
that I wanted to maintain the same ordering of the current code in the
vm destroy event, I didn't want to leave a partially shutdown VM on
the vmlist. If the ordering is entirely irrelevant and the
kvm_arch_destroy_vm can run well before kvm_destroy_vm is called, then
I can avoid changes to kvm_main.c but I doubt.

I've done it in a way that archs not needing mmu notifiers like s390
can simply add the kvm_destroy_common_vm at the top of their
kvm_arch_destroy_vm. All others using mmu_notifiers have to invoke
kvm_destroy_common_vm in the ->release of the mmu notifiers.

This will ensure that everything will be ok regardless if exit_mmap is
called before/after exit_files, and it won't make a whole lot of
difference anymore, if the driver fd is pinned through vmas->vm_file
released in exit_mmap or through the task filedescriptors relased in
exit_files etc... Infact this allows to call mmu_notifier_unregister
at anytime later after the task has already been killed, without any
trouble (like if the mmu notifier owner isn't registering in
current->mm but some other tasks mm).

I see. It seems a little strange to me as a KVM guest isn't really tiedto the current mm. It seems like the net effect of this is that we arenow tying a KVM guest to an mm.

For instance, if you create a guest, but didn't assign any memory to it,you could transfer the fd to another process and then close the fd(without destroying the guest). The other process then could assignmemory to it and presumably run the guest.

With your change, as soon as the first process exits, the guest will bedestroyed. I'm not sure this behavioral difference really matters butit is a behavioral difference.


Regards,

Anthony Liguori

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) ConferenceDon't miss this year's exciting event. There's still time to save $100.Use priority code J8TL2D2.http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [kvm-devel] mmu notifier #v14

Reply via email to