On 10/11/2014 13:15, Paolo Bonzini wrote:
>
>
> On 10/11/2014 11:45, Gleb Natapov wrote:
>>> I tried making also the other shared MSRs the same between guest and
>>> host (STAR, LSTAR, CSTAR, SYSCALL_MASK), so that the user return notifier
>>> has nothing to do. That saves about 4-500 cycles on inl_from_qemu. I
>>> do want to dig out my old Core 2 and see how the new test fares, but it
>>> really looks like your patch will be in 3.19.
>>
>> Please test on wide variety of HW before final decision.
>
> Yes, definitely.
I've reproduced Andy's results on Ivy Bridge:
NX off ~6900 cycles (EFER)
NX on, SCE off ~14600 cycles (urn)
NX on, SCE on ~6900 cycles (same value)
I also asked Intel about clarifications.
On Core 2 Duo the results are weird. There is no LOAD_EFER control,
so Andy's patch does not apply and the only interesting paths are urn
and same value.
The pessimization of EFER writes does _seem_ to be there, since I can
profile for iTLB flushes (r4082 on this microarchitecture) and get:
0.14% qemu-kvm [kernel.kallsyms] [k] native_write_msr_safe
0.14% qemu-kvm [kernel.kallsyms] [k] native_flush_tlb
but these are the top two results and it is not clear to me why perf
only records them as "0.14%"... Also, this machine has no EPT, so virt
suffers a lot from TLB misses anyway.
Nevertheless I tried running kvm-unit-tests with different values of the
MSRs to see what's the behavior.
NX=1/SCE=0 NX=1/SCE=1 all MSRs equal
cpuid 3374 3448 3608
vmcall 3274 3337 3478
mov_from_cr8 11 11 11
mov_to_cr8 15 15 15
inl_from_pmtimer 17803 16346 15156
inl_from_qemu 17858 16375 15163
inl_from_kernel 6351 6492 6622
outl_to_kernel 3850 3900 4053
mov_dr 116 116 117
ple-round-robin 15 16 16
wr_tsc_adjust_msr 3334 3417 3570
rd_tsc_adjust_msr 3374 3404 3605
mmio-no-eventfd:pci-mem 19188 17866 16660
mmio-wildcard-eventfd:pci-mem 7319 7414 7595
mmio-datamatch-eventfd:pci-mem 7304 7470 7605
portio-no-eventfd:pci-io 13219 11780 10447
portio-wildcard-eventfd:pci-io 3951 4024 4149
portio-datamatch-eventfd:pci-io 3940 4026 4228
In the last column, all shared MSRs are equal (*) host and guest. The
difference is very noisy on newer processors, but quite visible on the
older processor. It is weird though that the light-weight exits become
_more_ expensive as more MSRs are equal between guest and host.
Anyhow, this is more of a curiosity since the proposed patch has no effect.
Next will come Nehalem. Nehalem has both LOAD_EFER and EPT, so it's
already a good target. I can test Westmere too, as soon as I find
someone that has it, but it shouldn't give surprises.
Paolo
(*) run this:
#! /usr/bin/env python
class msr(object):
def __init__(self):
try:
self.f = open('/dev/cpu/0/msr', 'r', 0)
except:
self.f = open('/dev/msr0', 'r', 0)
def read(self, index, default = None):
import struct
self.f.seek(index)
try:
return struct.unpack('Q', self.f.read(8))[0]
except:
return default
m = msr()
for i in [0xc0000080, 0xc0000081, 0xc0000082, 0xc0000083, 0xc0000084]:
print ("wrmsr(0x%x, 0x%x);" % (i, m.read(i)))
and add the result to the enable_nx function.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html