On 08.08.2017 06:05, Longpeng(Mike) wrote:
> This is a simple optimization for kvm_vcpu_on_spin, the
> main idea is described in patch-1's commit msg.
> 
> I did some tests base on the RFC version, the result shows
> that it can improves the performance slightly.
> 
> == Geekbench-3.4.1 ==
> VM1:  8U,4G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19)
>       running Geekbench-3.4.1 *10 truns*
> VM2/VM3/VM4: configure is the same as VM1
>       stress each vcpu usage(seed by top in guest) to 40%
> 
> The comparison of each testcase's score:
> (higher is better)
>               before          after           improve
> Inter
>  single               1176.7          1179.0          0.2%
>  multi                3459.5          3426.5          -0.9%
> Float
>  single               1150.5          1150.9          0.0%
>  multi                3364.5          3391.9          0.8%
> Memory(stream)
>  single               1768.7          1773.1          0.2%
>  multi                2511.6          2557.2          1.8%
> Overall
>  single               1284.2          1286.2          0.2%
>  multi                3231.4          3238.4          0.2%
> 
> 
> == kernbench-0.42 ==
> VM1:    8U,12G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19)
>         running "kernbench -n 10"
> VM2/VM3/VM4: configure is the same as VM1
>         stress each vcpu usage(seed by top in guest) to 40%
> 
> The comparison of 'Elapsed Time':
> (sooner is better)
>               before          after           improve
> load -j4      12.762          12.751          0.1%
> load -j32     9.743           8.955           8.1%
> load -j               9.688           9.229           4.7%
> 
> 
> Physical Machine:
>   Architecture:          x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Byte Order:            Little Endian
>   CPU(s):                24
>   On-line CPU(s) list:   0-23
>   Thread(s) per core:    2
>   Core(s) per socket:    6
>   Socket(s):             2
>   NUMA node(s):          2
>   Vendor ID:             GenuineIntel
>   CPU family:            6
>   Model:                 45
>   Model name:            Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
>   Stepping:              7
>   CPU MHz:               2799.902
>   BogoMIPS:              5004.67
>   Virtualization:        VT-x
>   L1d cache:             32K
>   L1i cache:             32K
>   L2 cache:              256K
>   L3 cache:              15360K
>   NUMA node0 CPU(s):     0-5,12-17
>   NUMA node1 CPU(s):     6-11,18-23
> 
> ---
> Changes since V1:
>  - split the implementation of s390 & arm. [David]
>  - refactor the impls according to the suggestion. [Paolo]
> 
> Changes since RFC:
>  - only cache result for X86. [David & Cornlia & Paolo]
>  - add performance numbers. [David]
>  - impls arm/s390. [Christoffer & David]
>  - refactor the impls. [me]
> 
> ---
> Longpeng(Mike) (4):
>   KVM: add spinlock optimization framework
>   KVM: X86: implement the logic for spinlock optimization
>   KVM: s390: implements the kvm_arch_vcpu_in_kernel()
>   KVM: arm: implements the kvm_arch_vcpu_in_kernel()
> 
>  arch/arm/kvm/handle_exit.c      |  2 +-
>  arch/arm64/kvm/handle_exit.c    |  2 +-
>  arch/mips/kvm/mips.c            |  6 ++++++
>  arch/powerpc/kvm/powerpc.c      |  6 ++++++
>  arch/s390/kvm/diag.c            |  2 +-
>  arch/s390/kvm/kvm-s390.c        |  6 ++++++
>  arch/x86/include/asm/kvm_host.h |  5 +++++
>  arch/x86/kvm/hyperv.c           |  2 +-
>  arch/x86/kvm/svm.c              | 10 +++++++++-
>  arch/x86/kvm/vmx.c              | 16 +++++++++++++++-
>  arch/x86/kvm/x86.c              | 11 +++++++++++
>  include/linux/kvm_host.h        |  3 ++-
>  virt/kvm/arm/arm.c              |  5 +++++
>  virt/kvm/kvm_main.c             |  4 +++-
>  14 files changed, 72 insertions(+), 8 deletions(-)
> 

I am curious, is there any architecture that allows to trigger
kvm_vcpu_on_spin(vcpu); while _not_ in kernel mode?

I would have guessed that user space should never be allowed to make cpu
wide decisions (giving up the CPU to the hypervisor).

E.g. s390x diag can only be executed from kernel space. VMX PAUSE is
only valid from kernel space.

I.o.w. do we need a parameter to kvm_vcpu_on_spin(vcpu); at all, or is
"me_in_kernel" basically always true?

-- 

Thanks,

David

Reply via email to