On Sun, Sep 07, 2008 at 12:52:21PM +0300, Avi Kivity wrote:
> What if vcpu0 is in mode X, while vcpu1 is in mode Y. vcpu0 writes to
> some pagetable, causing both mode X and mode Y shadows to become
> unsynced, so on the next resync (either by vcpu0 or vcpu1) we need to
> sync both modes.
>From the oos core patch:
- hlist_for_each_entry(sp, node, bucket, hash_link)
- if (sp->gfn == gfn && sp->role.word == role.word) {
+ hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
+ if (sp->gfn == gfn) {
+ /*
+ * If a pagetable becomes referenced by more than one
+ * root, or has multiple roles, unsync it and disable
+ * oos. For higher level pgtables the entire tree
+ * has to be synced.
+ */
+ if (sp->root_gfn != root_gfn) {
+ kvm_set_pg_inuse(sp);
+ if (set_shared_mmu_page(vcpu, sp))
+ tmp = bucket->first;
+ kvm_clear_pg_inuse(sp);
+ unsyncable = 0;
+ }
So as soon as a pagetable is shadowed with different modes, its resynced
and unsyncing is disabled.
> Same problem with kvm_mmu_pte_write(), which right now hacks around it.
>
> Maybe we need a ->ops member.
>> + if (!is_present_pte(*pt)) {
>> + rmap_remove(vcpu->kvm, &sp->spt[i]);
>> + sp->spt[i] = shadow_notrap_nonpresent_pte;
>> + pt++;
>> + continue;
>> + }
>>
>
> Are we missing a tlb flush? Or will the caller take care of it?
Yes, there's a local TLB flush missing, which can be collapsed into a
single kvm_x86_ops->tlb_flush in the caller.
>> +
>> + pte_access = sp->role.access & FNAME(gpte_access)(vcpu,
>> *pt);
>> + /* user */
>> + if (pte_access & ACC_USER_MASK)
>> + spte |= shadow_user_mask;
>>
>
> There are some special cases involving cr0.wp=0 and the user mask. so
> spte.u is not correlated exactly with gpte.u.
How come?
>> + /* guest->shadow accessed sync */
>> + if (!(*pt & PT_ACCESSED_MASK))
>> + spte &= ~PT_ACCESSED_MASK;
>>
>
> spte shouldn't be accessible at all if gpte is not accessed, so we can
> set gpte.a on the next access (similar to spte not being writeable if
> gpte is not dirty).
Right. Perhaps accessed bit synchronization to guest could be performed
lazily somehow, so as to avoid a vmexit on every first page access.
>> + /* shadow->guest accessed sync */
>> + if (spte & PT_ACCESSED_MASK)
>> + set_bit(PT_ACCESSED_SHIFT, (unsigned long *)pt);
>>
>
> host accessed and guest accessed are very different. We shouldn't set
> host accessed unless we're sure the guest will access the page very soon.
>
>> + set_shadow_pte(&sp->spt[i], spte);
>>
>
> What if permissions are reduced?
Then a local TLB flush is needed. Flushing the TLB's of remote vcpus
should be done by the guest AFAICS.
> You can use PT_* instead of shadow_* as this will never be called when
> ept is active.
>
> I'm worried about the duplication with kvm_mmu_set_pte(). Perhaps that
> can be refactored instead to be the inner loop.
Will look into that.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html