On Fri, Jun 06, 2008 at 11:49:23AM +0300, Avi Kivity wrote:
> Andrea Arcangeli wrote:
>>
>>> One day we want to sort the slots according to size.  We'll need better 
>>> locking then (rcu, likely).
>>>     
>>
>> I think it's more interesting to sort them on their start/end gfn
>> address, prio_tree would be the optimal structure for me to use in the
>> mmu notifier invalidates as I could ask the prio tree to show me in
>> O(log(N)) (N number of slots) all the slots that overlap with the
>> invalidated start/end range. However I'm afraid there aren't enough
>> slots for this to matter... but OTOH there aren't frequent
>> modifications either, so it may be a microoptimization (if there were
>> frequent modifications with such a small number of slots it likely
>> would be slower than a list).
>>   
>
> There are only two interesting slots, 1G-end_of_mem and 0-pci_hole.  A 
> sorted array gives an average of less than two lookups to find the slot, 
> with the smallest possible constant.  Any other algorithm will give more 
> lookups with a far higher constant.
>
> Sometimes linear search is the best algorithm.

But we can't break the loop after the first match! I think you're not
taking into account the aliasing done with overlapping gfn in the
memslots (which is not forbidden apparently and you planned to
obsolete the explicit aliasing logic with it). Only prio_tree can
reduce the number of walks in the most optimal way because of the gfn
overlaps and search by start-end address. Sorting by size won't be
useful because we can't break the loop after the first found match. If
prio tree is too heavyweight the next best would be sorting with a
single-index tree like rbtree by start address as it'll at least
eliminate all the memslots that have a start address higher than the
end of the invalidate range. But prio_tree will eliminate them all in
O(log(N)) so on the paper prio_tree is best (but perhaps in practice
rbtree is better, dunno).

> Surely ->release can't happen while vcpus are still running?

Exactly.

> x86, ia64, and ppc all need mmu notifiers as well as other new 
> architectures, so this better be in common code.  We can #ifdef it so s390 
> doesn't suffer a performance impact.

Ok, that's fine with me (I'll undo the changes post the s390 comment ;).

> I forgot, last round:  EPT doesn't have PT_ACCESSED_MASK, so you're reading 
> (and clearing) some random bit.  We can easily qualify the test by looking 
> at shadow_accessed_mask; if zero, the hardware doesn't have a young bit.
>
> There's a bigger question of what to do in the case of EPT.  We can always 
> return true, or return true and also unmap.  The first means we lie to the 
> vm, the second means we take unnecessary exits to maintain age information.

If there's no access bit (and we can't use the guest pte access bit
because EPT represent a physical page and not a guest pte anymore)
we're forced to return 0 without doing anything at all. Returning 1 is
unsafe as it'd pin the page in host ram.

If there's no access bit all we can do is to wait the VM to unmap the
page, mark the spte not present during the invalidate_page mmu
notifier, and wait the guest to trigger a minor fault from swapcache.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to