Marcelo Tosatti wrote:
On Sun, Sep 07, 2008 at 12:52:21PM +0300, Avi Kivity wrote:
What if vcpu0 is in mode X, while vcpu1 is in mode Y. vcpu0 writes to some pagetable, causing both mode X and mode Y shadows to become unsynced, so on the next resync (either by vcpu0 or vcpu1) we need to sync both modes.

From the oos core patch:

-       hlist_for_each_entry(sp, node, bucket, hash_link)
-               if (sp->gfn == gfn && sp->role.word == role.word) {
+       hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+               if (sp->gfn == gfn) {
+                       /*
+                        * If a pagetable becomes referenced by more than one
+                        * root, or has multiple roles, unsync it and disable
+                        * oos. For higher level pgtables the entire tree
+                        * has to be synced.
+                        */
+                       if (sp->root_gfn != root_gfn) {
+                               kvm_set_pg_inuse(sp);
+                               if (set_shared_mmu_page(vcpu, sp))
+                                       tmp = bucket->first;
+                               kvm_clear_pg_inuse(sp);
+                               unsyncable = 0;
+                       }

So as soon as a pagetable is shadowed with different modes, its resynced and unsyncing is disabled.


Okay. But the complexity here, esp. with rarely used cases like multiple mode shadows, is frightening.

+
+                       pte_access = sp->role.access & FNAME(gpte_access)(vcpu, 
*pt);
+                       /* user */
+                       if (pte_access & ACC_USER_MASK)
+                               spte |= shadow_user_mask;
There are some special cases involving cr0.wp=0 and the user mask. so spte.u is not correlated exactly with gpte.u.

How come?


When cr0.wp=0, the cpu ignores pte.w for cpl=0 accesses. kvm requires cr0.wp=1 (since we need to write protect pages, for many reasons, like emulating pte.dirty). This is how we handle pte.u=1 + pte.w=0:

- for cpl 0 accesses, we set spte.w=1 (to allow the write) and spte.u=0 (to forbid cpl>0 accesses) - for cpl>0 accesses, we set spte.w=0 (to forbid userspace write accesses) and spte.u=1 (to allow cpl>0 read accesses)

this works well except if the accesses keep alternating between userspace and kernel.

+                       /* guest->shadow accessed sync */
+                       if (!(*pt & PT_ACCESSED_MASK))
+                               spte &= ~PT_ACCESSED_MASK;
spte shouldn't be accessible at all if gpte is not accessed, so we can set gpte.a on the next access (similar to spte not being writeable if gpte is not dirty).

Right. Perhaps accessed bit synchronization to guest could be performed
lazily somehow, so as to avoid a vmexit on every first page access.

I don't think this is doable (well you can do it if you make the guest page table not present, but then even reading the accessed bit faults).

+                       set_shadow_pte(&sp->spt[i], spte);
What if permissions are reduced?

Then a local TLB flush is needed. Flushing the TLB's of remote vcpus
should be done by the guest AFAICS.


hm. It depends on why they are reduced. If the page became shadowed, then we are responsible.

Don't think this is the case here, so local flush is likely sufficient.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to