On Sun, Sep 07, 2008 at 12:52:21PM +0300, Avi Kivity wrote:
> What if vcpu0 is in mode X, while vcpu1 is in mode Y.  vcpu0 writes to  
> some pagetable, causing both mode X and mode Y shadows to become  
> unsynced, so on the next resync (either by vcpu0 or vcpu1) we need to  
> sync both modes.

>From the oos core patch:

-       hlist_for_each_entry(sp, node, bucket, hash_link)
-               if (sp->gfn == gfn && sp->role.word == role.word) {
+       hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+               if (sp->gfn == gfn) {
+                       /*
+                        * If a pagetable becomes referenced by more than one
+                        * root, or has multiple roles, unsync it and disable
+                        * oos. For higher level pgtables the entire tree
+                        * has to be synced.
+                        */
+                       if (sp->root_gfn != root_gfn) {
+                               kvm_set_pg_inuse(sp);
+                               if (set_shared_mmu_page(vcpu, sp))
+                                       tmp = bucket->first;
+                               kvm_clear_pg_inuse(sp);
+                               unsyncable = 0;
+                       }

So as soon as a pagetable is shadowed with different modes, its resynced 
and unsyncing is disabled.

> Same problem with kvm_mmu_pte_write(), which right now hacks around it.
>
> Maybe we need a ->ops member.

>> +                    if (!is_present_pte(*pt)) {
>> +                            rmap_remove(vcpu->kvm, &sp->spt[i]);
>> +                            sp->spt[i] = shadow_notrap_nonpresent_pte;
>> +                            pt++;
>> +                            continue;
>> +                    }
>>   
>
> Are we missing a tlb flush?  Or will the caller take care of it?

Yes, there's a local TLB flush missing, which can be collapsed into a
single kvm_x86_ops->tlb_flush in the caller.

>> +
>> +                    pte_access = sp->role.access & FNAME(gpte_access)(vcpu, 
>> *pt);
>> +                    /* user */
>> +                    if (pte_access & ACC_USER_MASK)
>> +                            spte |= shadow_user_mask;
>>   
>
> There are some special cases involving cr0.wp=0 and the user mask.  so  
> spte.u is not correlated exactly with gpte.u.

How come?

>> +                    /* guest->shadow accessed sync */
>> +                    if (!(*pt & PT_ACCESSED_MASK))
>> +                            spte &= ~PT_ACCESSED_MASK;
>>   
>
> spte shouldn't be accessible at all if gpte is not accessed, so we can  
> set gpte.a on the next access (similar to spte not being writeable if  
> gpte is not dirty).

Right. Perhaps accessed bit synchronization to guest could be performed
lazily somehow, so as to avoid a vmexit on every first page access.

>> +                    /* shadow->guest accessed sync */
>> +                    if (spte & PT_ACCESSED_MASK)
>> +                            set_bit(PT_ACCESSED_SHIFT, (unsigned long *)pt);
>>   
>
> host accessed and guest accessed are very different.  We shouldn't set  
> host accessed unless we're sure the guest will access the page very soon.
>
>> +                    set_shadow_pte(&sp->spt[i], spte);
>>   
>
> What if permissions are reduced?

Then a local TLB flush is needed. Flushing the TLB's of remote vcpus
should be done by the guest AFAICS.

> You can use PT_* instead of shadow_* as this will never be called when  
> ept is active.
>
> I'm worried about the duplication with kvm_mmu_set_pte().  Perhaps that  
> can be refactored instead to be the inner loop.

Will look into that.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to