Re: [PATCH v6 1/6] arm/arm64: KVM: Introduce armv7 fp/simd vcpu fields and helpers

2016-01-05 Thread Mario Smarduch


On 1/5/2016 7:00 AM, Christoffer Dall wrote:
> On Sat, Dec 26, 2015 at 01:54:55PM -0800, Mario Smarduch wrote:
>> Add helper functions to enable access to fp/smid on guest entry and save host
>> fpexc on vcpu put, check if fp/simd registers are dirty and add new vcpu
>> fields.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_emulate.h   | 42 
>> 
>>  arch/arm/include/asm/kvm_host.h  |  6 ++
>>  arch/arm64/include/asm/kvm_emulate.h |  8 +++
>>  3 files changed, 56 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h 
>> b/arch/arm/include/asm/kvm_emulate.h
>> index 3095df0..d4d9da1 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -24,6 +24,8 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> +#include "../vfp/vfpinstr.h"
> 
> this looks dodgy...
> 
> can you move vfpinstr.h instead?
Sure I'll fix it up, it's in couple other places in kernel and kvm
 - copied it.
> 
>>  
>>  unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
>>  unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
>> @@ -255,4 +257,44 @@ static inline unsigned long 
>> vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  }
>>  }
>>  
>> +#ifdef CONFIG_VFPv3
>> +/* Called from vcpu_load - save fpexc and enable guest access to fp/simd 
>> unit */
> 
> the comment is misleading, you're not enabling guest access to the
> fp/simd unit, you're just setting the enabled bit to ensure guest
> accesses trap.

That's more accurate.
> 
>> +static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu)
>> +{
>> +u32 fpexc;
>> +
>> +/* Save host fpexc, and enable guest access to fp unit */
>> +fpexc = fmrx(FPEXC);
>> +vcpu->arch.host_fpexc = fpexc;
>> +fpexc |= FPEXC_EN;
>> +fmxr(FPEXC, fpexc);
>> +
>> +/* Configure HCPTR to trap on tracing and fp/simd access */
>> +vcpu->arch.hcptr = HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11);
>> +}
>> +
>> +/* Called from vcpu_put - restore host fpexc */
> 
> I would probably get rid of the "Called from" stuff and just describe
> what these functions do locally.  Comments like this are likely to be
> out of date soon'ish.

Yeah true, will do.
> 
>> +static inline void vcpu_restore_host_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +fmxr(FPEXC, vcpu->arch.host_fpexc);
>> +}
>> +
>> +/* If trap bits are reset then fp/simd registers are dirty */
>> +static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +return !(vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
>> +}
>> +#else
>> +static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu)
>> +{
>> +vcpu->arch.hcptr = HCPTR_TTA;
>> +}
>> +
>> +static inline void vcpu_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +return false;
>> +}
>> +#endif
> 
> this kind of feels like it belongs in its own C-file instead of a header
> file, perhaps arch/arm/kvm/vfp.C.
> 
> Marc, what do you think?
> 

That would be starting from vcpu_trap_vfp_enable()? The file is getting
little overloaded.

I'm also thinking that 3rd patch should have one function call for vcpu_put
like vcpu_load does instead of exposing arm32/arm64, arm32 only relevant logic.
When you have a chance to review that patch please keep that in mind.

>> +
>>  #endif /* __ARM_KVM_EMULATE_H__ */
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index f9f2779..d3ef58a 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -105,6 +105,12 @@ struct kvm_vcpu_arch {
>>  /* HYP trapping configuration */
>>  u32 hcr;
>>  
>> +/* HYP Co-processor fp/simd and trace trapping configuration */
>> +u32 hcptr;
>> +
>> +/* Save host FPEXC register to later restore on vcpu put */
>> +u32 host_fpexc;
>> +
>>  /* Interrupt related fields */
>>  u32 irq_lines;  /* IRQ and FIQ levels */
>>  
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index 3066328..ffe8ccf 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -299,4 +299,12 @@ static inline unsigned long 
>> vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  return data;/* Leave LE untouched */
>>  }
>>  
>> +static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu) {}
>> +static inline void vcpu_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +
>> +static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +return false;
>> +}
>> +
>>  #endif /* __ARM64_KVM_EMULATE_H__ */
>> -- 
>> 1.9.1
>>
> 
> Thanks,
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/6] arm: KVM: Introduce host fp/simd context switch function

2015-12-26 Thread Mario Smarduch
Add fp/simd context switch function callable from host kernel mode.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/kvm/Makefile|  2 +-
 arch/arm/kvm/fpsimd_switch.S | 47 
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/kvm/fpsimd_switch.S

diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index c5eef02c..411b3e4 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o $(KVM)/vf
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
+obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o fpsimd_switch.o
 obj-y += $(KVM)/arm/vgic.o
 obj-y += $(KVM)/arm/vgic-v2.o
 obj-y += $(KVM)/arm/vgic-v2-emul.o
diff --git a/arch/arm/kvm/fpsimd_switch.S b/arch/arm/kvm/fpsimd_switch.S
new file mode 100644
index 000..7e48c16
--- /dev/null
+++ b/arch/arm/kvm/fpsimd_switch.S
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2015 - Samsung - Open Source Group
+ * Author: Mario Smarduch <m.smard...@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "interrupts_head.S"
+
+   .text
+/**
+  * void vcpu_restore_host_vfp_state(struct vcpu *vcpu) -
+  *This function is called from host to save the guest, and restore host
+  * fp/simd hardware context.
+  */
+ENTRY(vcpu_restore_host_vfp_state)
+#ifdef CONFIG_VFPv3
+   push{r4-r7}
+
+   add r7, r0, #VCPU_VFP_GUEST
+   store_vfp_state r7
+
+   add r7, r0, #VCPU_VFP_HOST
+   ldr r7, [r7]
+   restore_vfp_state r7
+
+   pop {r4-r7}
+#endif
+   bx  lr
+ENDPROC(vcpu_restore_host_vfp_state)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/6] arm/arm64: KVM: Enhance armv7/8 fp/simd lazy switch

2015-12-26 Thread Mario Smarduch
Current lazy fp/simd implementation switches hardware context on guest access 
and again on exit to host, otherwise context switch is skipped. This patch 
set builds on that functionality and executes a hardware context switch on 
first time access and when vCPU is scheduled out or returns to user space (on 
vcpu_put).

For an FP and lmbench load it reduces fp/simd context switch from 30-50% down
to near 0%. Results will vary with load but is no worse then current
approach.

Running floating point application on nearly idle system:
./tst-float 10uS - (sleep for .1s) fp/simd switch reduced by 99%+
./tst-float 1uS -  (sleep for 10 ms)  reduced by 98%+
./tst-float 1000uS -   (sleep for 1ms)reduced by ~98%
...
./tst-float 1uS - reduced by  2%+

Tested on Juno, Foundation Model, and Fast Models. 

Test Details:
-
armv7 - with CONFIG_VFP, CONFIG_NEON, CONFIG_KERNEL_MODE_NEON options enabled:

- On host executed 12 fp applications - with ranging sleep intervals
- Two guests - with 12 fp processes  - with ranging sleep intervals

armv8 
- Similar to armv7, with mix of 32 and 64 bit guests - on Juno ran 2-32bit and
  2-64 bit guests.

These patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

Changes since v5->v6:
- Followed up on Christoffers comments
  o armv7 - replaced fp/simd asm with supported function calls 
  o armv7 - save hcptr once on access instead of every exit
  o armv7 - removed hcptr macro
  o armv7 - fixed twisted boolean return logic
  o armv7 - removed isb after setting fpexec32 since its followed with a 
hyp call
  o armv8 - rebased to 4.4-rc5 - wsinc
  o armv8 - as with hpctpr save cptr_el2 on access instead of every exit
  o armv7/armv8 - restructured patch series to simplify review 

Chances since v4->v5:
- Followed up on Marcs comments
  - Removed dirty flag, and used trap bits to check for dirty fp/simd
  - Seperated host form hyp code
  - As a consequence for arm64 added a commend assember header file
  - Fixed up critical accesses to fpexec, and added isb
  - Converted defines to inline functions

Changes since v3->v4:
- Followup on Christoffers comments 
  - Move fpexc handling to vcpu_load and vcpu_put
  - Enable and restore fpexc in EL2 mode when running a 32 bit guest on
64bit EL2
  - rework hcptr handling

Changes since v2->v3:
- combined arm v7 and v8 into one short patch series
- moved access to fpexec_el2 back to EL2
- Move host restore to EL1 from EL2 and call directly from host
- optimize trap enable code 
- renamed some variables to match usage

Changes since v1->v2:
- Fixed vfp/simd trap configuration to enable trace trapping
- Removed set_hcptr branch label
- Fixed handling of FPEXC to restore guest and host versions on vcpu_put
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit

Mario Smarduch (6):
  Introduce armv7 fp/simd vcpu fields and helpers
  Introduce host fp/simd context switch function
  Enable armv7 fp/simd enhanced context switch
  Deleted unused macros
  Introduce armv8 fp/simd vcpu fields and helpers
  Enable armv8 fp/simd enhanced context switch

 arch/arm/include/asm/kvm_emulate.h   | 54 
 arch/arm/include/asm/kvm_host.h  |  8 ++
 arch/arm/kernel/asm-offsets.c|  1 +
 arch/arm/kvm/Makefile|  2 +-
 arch/arm/kvm/arm.c   | 19 +
 arch/arm/kvm/fpsimd_switch.S | 47 +++
 arch/arm/kvm/interrupts.S| 43 ++--
 arch/arm/kvm/interrupts_head.S   | 29 ---
 arch/arm64/include/asm/kvm_asm.h |  5 
 arch/arm64/include/asm/kvm_emulate.h | 30 
 arch/arm64/include/asm/kvm_host.h| 12 
 arch/arm64/kernel/asm-offsets.c  |  1 +
 arch/arm64/kvm/hyp/entry.S   |  1 +
 arch/arm64/kvm/hyp/hyp-entry.S   | 26 +
 arch/arm64/kvm/hyp/switch.c  | 26 ++---
 15 files changed, 222 insertions(+), 82 deletions(-)
 create mode 100644 arch/arm/kvm/fpsimd_switch.S

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 1/6] arm/arm64: KVM: Introduce armv7 fp/simd vcpu fields and helpers

2015-12-26 Thread Mario Smarduch
Add helper functions to enable access to fp/smid on guest entry and save host
fpexc on vcpu put, check if fp/simd registers are dirty and add new vcpu
fields.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_emulate.h   | 42 
 arch/arm/include/asm/kvm_host.h  |  6 ++
 arch/arm64/include/asm/kvm_emulate.h |  8 +++
 3 files changed, 56 insertions(+)

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index 3095df0..d4d9da1 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include "../vfp/vfpinstr.h"
 
 unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
@@ -255,4 +257,44 @@ static inline unsigned long vcpu_data_host_to_guest(struct 
kvm_vcpu *vcpu,
}
 }
 
+#ifdef CONFIG_VFPv3
+/* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit 
*/
+static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu)
+{
+   u32 fpexc;
+
+   /* Save host fpexc, and enable guest access to fp unit */
+   fpexc = fmrx(FPEXC);
+   vcpu->arch.host_fpexc = fpexc;
+   fpexc |= FPEXC_EN;
+   fmxr(FPEXC, fpexc);
+
+   /* Configure HCPTR to trap on tracing and fp/simd access */
+   vcpu->arch.hcptr = HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11);
+}
+
+/* Called from vcpu_put - restore host fpexc */
+static inline void vcpu_restore_host_fpexc(struct kvm_vcpu *vcpu)
+{
+   fmxr(FPEXC, vcpu->arch.host_fpexc);
+}
+
+/* If trap bits are reset then fp/simd registers are dirty */
+static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+   return !(vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
+}
+#else
+static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.hcptr = HCPTR_TTA;
+}
+
+static inline void vcpu_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+   return false;
+}
+#endif
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f9f2779..d3ef58a 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -105,6 +105,12 @@ struct kvm_vcpu_arch {
/* HYP trapping configuration */
u32 hcr;
 
+   /* HYP Co-processor fp/simd and trace trapping configuration */
+   u32 hcptr;
+
+   /* Save host FPEXC register to later restore on vcpu put */
+   u32 host_fpexc;
+
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
 
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 3066328..ffe8ccf 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -299,4 +299,12 @@ static inline unsigned long vcpu_data_host_to_guest(struct 
kvm_vcpu *vcpu,
return data;/* Leave LE untouched */
 }
 
+static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu) {}
+static inline void vcpu_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+
+static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+   return false;
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 3/6] arm/arm64: KVM: Enable armv7 fp/simd enhanced context switch

2015-12-26 Thread Mario Smarduch
Enable armv7 enhanced fp/simd context switch. Guest and host registers are only
context switched on first access and vcpu put. 

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_host.h   |  2 ++
 arch/arm/kernel/asm-offsets.c |  1 +
 arch/arm/kvm/arm.c| 10 +
 arch/arm/kvm/interrupts.S | 43 ++-
 arch/arm64/include/asm/kvm_host.h |  2 ++
 5 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index d3ef58a..90f7f59 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
+void vcpu_restore_host_vfp_state(struct kvm_vcpu *);
+
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..395ecca 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -185,6 +185,7 @@ int main(void)
   DEFINE(VCPU_PC,  offsetof(struct kvm_vcpu, 
arch.regs.usr_regs.ARM_pc));
   DEFINE(VCPU_CPSR,offsetof(struct kvm_vcpu, 
arch.regs.usr_regs.ARM_cpsr));
   DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr));
+  DEFINE(VCPU_HCPTR,   offsetof(struct kvm_vcpu, arch.hcptr));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
   DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr));
   DEFINE(VCPU_HxFAR,   offsetof(struct kvm_vcpu, arch.fault.hxfar));
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dda1959..b16ed98 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -308,10 +308,20 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
kvm_arm_set_running_vcpu(vcpu);
+
+   /* Save and enable fpexc, and enable default traps */
+   vcpu_trap_vfp_enable(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   /* If the fp/simd registers are dirty save guest, restore host. */
+   if (vcpu_vfp_isdirty(vcpu))
+   vcpu_restore_host_vfp_state(vcpu);
+
+   /* Restore host FPEXC trashed in vcpu_load */
+   vcpu_restore_host_fpexc(vcpu);
+
/*
 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 * if the vcpu is no longer assigned to a cpu.  This is used for the
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..245c11f 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -116,22 +116,15 @@ ENTRY(__kvm_vcpu_run)
read_cp15_state store_to_vcpu = 0
write_cp15_state read_from_vcpu = 1
 
-   @ If the host kernel has not been configured with VFPv3 support,
-   @ then it is safer if we deny guests from using it as well.
-#ifdef CONFIG_VFPv3
-   @ Set FPEXC_EN so the guest doesn't trap floating point instructions
-   VFPFMRX r2, FPEXC   @ VMRS
-   push{r2}
-   orr r2, r2, #FPEXC_EN
-   VFPFMXR FPEXC, r2   @ VMSR
-#endif
+   @ Configure trapping of access to tracing and fp/simd registers
+   ldr r1, [vcpu, #VCPU_HCPTR]
+   mcr p15, 4, r1, c1, c1, 2
 
@ Configure Hyp-role
configure_hyp_role vmentry
 
@ Trap coprocessor CRx accesses
set_hstr vmentry
-   set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
set_hdcr vmentry
 
@ Write configured ID register into MIDR alias
@@ -170,23 +163,10 @@ __kvm_vcpu_return:
@ Don't trap coprocessor accesses for host kernel
set_hstr vmexit
set_hdcr vmexit
-   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), 
after_vfp_restore
 
-#ifdef CONFIG_VFPv3
-   @ Switch VFP/NEON hardware state to the host's
-   add r7, vcpu, #VCPU_VFP_GUEST
-   store_vfp_state r7
-   add r7, vcpu, #VCPU_VFP_HOST
-   ldr r7, [r7]
-   restore_vfp_state r7
-
-after_vfp_restore:
-   @ Restore FPEXC_EN which we clobbered on entry
-   pop {r2}
-   VFPFMXR FPEXC, r2
-#else
-after_vfp_restore:
-#endif
+   @ Disable trace and fp/simd traps
+   mov r2, #0
+   mcr p15, 4, r2, c1, c1, 2
 
@ Reset Hyp-role
configure_hyp_role vmexit
@@ -482,8 +462,15 @@ guest_trap:
 switch_to_guest_vfp:
push{r3-r7}
 
-   @ NEON/VFP used.  Turn on VFP access.
-   set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
+   @ fp/simd was accessed, so disable trapping and save hcptr register
+   @ which is used across exits until next vcpu_load.
+   mrc p15, 

[PATCH v6 4/6] arm: KVM: Delete unused macros

2015-12-26 Thread Mario Smarduch
set_hcptr is no longer used so delete it.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/kvm/interrupts_head.S | 29 -
 1 file changed, 29 deletions(-)

diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 51a5950..f4d8311 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -589,35 +589,6 @@ ARM_BE8(revr6, r6  )
mcr p15, 4, r2, c1, c1, 3
 .endm
 
-/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
- * (hardware reset value is 0). Keep previous value in r2.
- * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
- * VFP wasn't already enabled (always executed on vmtrap).
- * If a label is specified with vmexit, it is branched to if VFP wasn't
- * enabled.
- */
-.macro set_hcptr operation, mask, label = none
-   mrc p15, 4, r2, c1, c1, 2
-   ldr r3, =\mask
-   .if \operation == vmentry
-   orr r3, r2, r3  @ Trap coproc-accesses defined in mask
-   .else
-   bic r3, r2, r3  @ Don't trap defined coproc-accesses
-   .endif
-   mcr p15, 4, r3, c1, c1, 2
-   .if \operation != vmentry
-   .if \operation == vmexit
-   tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
-   beq 1f
-   .endif
-   isb
-   .if \label != none
-   b   \label
-   .endif
-1:
-   .endif
-.endm
-
 /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
  * (hardware reset value is 0) */
 .macro set_hdcr operation
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 5/6] arm/arm64: KVM: Introduce armv8 fp/simd vcpu fields and helpers

2015-12-26 Thread Mario Smarduch
Similar to armv7 add helper functions to enable access to fp/smid registers on 
guest entry. Save guest fpexc32_el2 on vcpu_put, check if guest is 32 bit.
Save guest and restore host registers from host kernel, and check 
if fp/simd registers are dirty, lastly add cptr_el2 vcpu field.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_emulate.h   | 12 
 arch/arm64/include/asm/kvm_asm.h |  5 +
 arch/arm64/include/asm/kvm_emulate.h | 26 --
 arch/arm64/include/asm/kvm_host.h| 12 +++-
 arch/arm64/kvm/hyp/hyp-entry.S   | 26 ++
 5 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index d4d9da1..a434dc5 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -284,6 +284,12 @@ static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
 {
return !(vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
 }
+
+static inline bool vcpu_guest_is_32bit(struct kvm_vcpu *vcpu)
+{
+   return true;
+}
+static inline void vcpu_save_fpexc(struct kvm_vcpu *vcpu) {}
 #else
 static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu)
 {
@@ -295,6 +301,12 @@ static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
 {
return false;
 }
+
+static inline bool vcpu_guest_is_32bit(struct kvm_vcpu *vcpu)
+{
+   return true;
+}
+static inline void vcpu_save_fpexc(struct kvm_vcpu *vcpu) {}
 #endif
 
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 52b777b..ddae814 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -48,6 +48,11 @@ extern u64 __vgic_v3_get_ich_vtr_el2(void);
 
 extern u32 __kvm_get_mdcr_el2(void);
 
+extern void __fpsimd_prepare_fpexc32(void);
+extern void __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu);
+extern void __fpsimd_save_state(struct user_fpsimd_state *);
+extern void __fpsimd_restore_state(struct user_fpsimd_state *);
+
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index ffe8ccf..f8203c7 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -299,12 +299,34 @@ static inline unsigned long 
vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
return data;/* Leave LE untouched */
 }
 
-static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu) {}
+static inline bool vcpu_guest_is_32bit(struct kvm_vcpu *vcpu)
+{
+   return !(vcpu->arch.hcr_el2 & HCR_RW);
+}
+
+static inline void vcpu_trap_vfp_enable(struct kvm_vcpu *vcpu)
+{
+   /* For 32 bit guest enable access to fp/simd registers */
+   if (vcpu_guest_is_32bit(vcpu))
+   vcpu_prepare_fpexc();
+
+   vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
+}
+
 static inline void vcpu_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
 
 static inline bool vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
 {
-   return false;
+   return !(vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
+}
+
+static inline void vcpu_restore_host_vfp_state(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
+   struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
+
+   __fpsimd_save_state(_ctxt->gp_regs.fp_regs);
+   __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
 }
 
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index bfe4d4e..5d0c256 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -180,6 +181,7 @@ struct kvm_vcpu_arch {
/* HYP configuration */
u64 hcr_el2;
u32 mdcr_el2;
+   u32 cptr_el2;
 
/* Exception Information */
struct kvm_vcpu_fault_info fault;
@@ -338,7 +340,15 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 
-static inline void vcpu_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
+static inline void vcpu_prepare_fpexc(void)
+{
+   kvm_call_hyp(__fpsimd_prepare_fpexc32);
+}
+
+static inline void vcpu_save_fpexc(struct kvm_vcpu *vcpu)
+{
+   kvm_call_hyp(__fpsimd_save_fpexc32, vcpu);
+}
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 93e8d983..a9235e2 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -164,6 +164,32 @@ ENTRY(__hyp_do_panic)
eret
 ENDPROC(__hyp_do_panic)
 
+/

[PATCH v6 6/6] arm/arm64: KVM: Enable armv8 fp/simd enhanced context switch

2015-12-26 Thread Mario Smarduch
Enable armv8 enhanced fp/simd context switch. Guest and host registers are only
context switched on first access and vcpu put.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/kvm/arm.c  | 13 +++--
 arch/arm64/kernel/asm-offsets.c |  1 +
 arch/arm64/kvm/hyp/entry.S  |  1 +
 arch/arm64/kvm/hyp/switch.c | 26 ++
 4 files changed, 15 insertions(+), 26 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index b16ed98..633a208 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -316,10 +316,19 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
/* If the fp/simd registers are dirty save guest, restore host. */
-   if (vcpu_vfp_isdirty(vcpu))
+   if (vcpu_vfp_isdirty(vcpu)) {
+
vcpu_restore_host_vfp_state(vcpu);
 
-   /* Restore host FPEXC trashed in vcpu_load */
+   /*
+* For 32bit guest on arm64 save the guest fpexc register
+* in EL2 mode.
+*/
+   if (vcpu_guest_is_32bit(vcpu))
+   vcpu_save_fpexc(vcpu);
+   }
+
+   /* For arm32 restore host FPEXC trashed in vcpu_load. */
vcpu_restore_host_fpexc(vcpu);
 
/*
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 94090a6..d69145c 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -112,6 +112,7 @@ int main(void)
   DEFINE(VCPU_ESR_EL2, offsetof(struct kvm_vcpu, arch.fault.esr_el2));
   DEFINE(VCPU_FAR_EL2, offsetof(struct kvm_vcpu, arch.fault.far_el2));
   DEFINE(VCPU_HPFAR_EL2,   offsetof(struct kvm_vcpu, 
arch.fault.hpfar_el2));
+  DEFINE(VCPU_CPTR_EL2,offsetof(struct kvm_vcpu, 
arch.cptr_el2));
   DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
 #endif
 #ifdef CONFIG_CPU_PM
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index fd0fbe9..ce7e903 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -136,6 +136,7 @@ ENTRY(__fpsimd_guest_restore)
isb
 
mrs x3, tpidr_el2
+   str w2, [x3, #VCPU_CPTR_EL2]
 
ldr x0, [x3, #VCPU_HOST_CONTEXT]
kern_hyp_va x0
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index ca8f5a5..962d179 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -19,24 +19,10 @@
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
-   u64 val;
-
-   /*
-* We are about to set CPTR_EL2.TFP to trap all floating point
-* register accesses to EL2, however, the ARM ARM clearly states that
-* traps are only taken to EL2 if the operation would not otherwise
-* trap to EL1.  Therefore, always make sure that for 32-bit guests,
-* we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
-*/
-   val = vcpu->arch.hcr_el2;
-   if (!(val & HCR_RW)) {
-   write_sysreg(1 << 30, fpexc32_el2);
-   isb();
-   }
-   write_sysreg(val, hcr_el2);
+   write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
-   write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
+   write_sysreg(vcpu->arch.cptr_el2, cptr_el2);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 }
 
@@ -89,7 +75,6 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
-   bool fp_enabled;
u64 exit_code;
 
vcpu = kern_hyp_va(vcpu);
@@ -119,8 +104,6 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
exit_code = __guest_enter(vcpu, host_ctxt);
/* And we're baaack! */
 
-   fp_enabled = __fpsimd_enabled();
-
__sysreg_save_state(guest_ctxt);
__sysreg32_save_state(vcpu);
__timer_save_state(vcpu);
@@ -131,11 +114,6 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
 
__sysreg_restore_state(host_ctxt);
 
-   if (fp_enabled) {
-   __fpsimd_save_state(_ctxt->gp_regs.fp_regs);
-   __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
-   }
-
__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
__debug_cond_restore_host_state(vcpu);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-22 Thread Mario Smarduch


On 12/22/2015 12:06 AM, Christoffer Dall wrote:
> On Mon, Dec 21, 2015 at 11:34:25AM -0800, Mario Smarduch wrote:
>>
>>
>> On 12/18/2015 11:45 PM, Christoffer Dall wrote:
>>> On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
>>>> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
>>>>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
[...]

>>>>>> +  * we set FPEXC.EN to prevent traps to EL1, when setting the TFP 
>>>>>> bit.
>>>>>> +  */
>>>>>> +ENTRY(__kvm_vcpu_enable_fpexc32)
>>>>>> +mov x3, #(1 << 30)
>>>>>> +msr fpexc32_el2, x3
>>>>>> +isb
>>>>>
>>>>> this is only called via a hypercall so do you really need the ISB?
>>>>
>>>> Same comment as in 2nd patch for the isb.
>>>>
>>>
>>> Unless you can argue that something needs to take effect before
>>> something else, where there's no other implicit barrier, you don't need
>>> the ISB.
>>
>> Make sense an exception level change should be a barrier. It was not there
>> before I put it in due to lack of info on meaning of 'implicit'. The manual 
>> has
>> more info on implicit barriers for operations like DMB.
> 
> if the effect from the register write just has to be visible after
> taking an exception, then you don't need the ISB.

Good definition, should be in the manual :)

Thanks.
> 
>>
>> Speaking of ISB it doesn't appear like this one is needed, it's between 
>> couple
>> register reads in 'save_time_state' macro.
>>
>> mrc p15, 0, r2, c14, c3, 1  @ CNTV_CTL
>> str r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
>>
>> isb
>>
>> mrrcp15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL
>>
> 
> I think there was a reason for that one, so let's not worry about that
> for now.
> 
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-21 Thread Mario Smarduch


On 12/18/2015 11:45 PM, Christoffer Dall wrote:
> On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
>> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
>>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
>>>> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 
>>>> register.
>>>> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
>>>> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
>>>> to save host and restore guest context, and clear trapping bits to enable 
>>>> vcpu 
>>>> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
>>>> context and also save 32 bit guest fpexc register.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
>>>>  arch/arm/include/asm/kvm_host.h  |   2 +
>>>>  arch/arm/kvm/arm.c   |  20 +--
>>>>  arch/arm64/include/asm/kvm_asm.h |   2 +
>>>>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>>>>  arch/arm64/include/asm/kvm_host.h|  16 +-
>>>>  arch/arm64/kernel/asm-offsets.c  |   1 +
>>>>  arch/arm64/kvm/Makefile  |   3 +-
>>>>  arch/arm64/kvm/fpsimd_switch.S   |  38 
>>>>  arch/arm64/kvm/hyp.S | 108 
>>>> +--
>>>>  arch/arm64/kvm/hyp_head.S|  48 
>>>>  11 files changed, 181 insertions(+), 77 deletions(-)
>>>>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>>>>  create mode 100644 arch/arm64/kvm/hyp_head.S
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_emulate.h 
>>>> b/arch/arm/include/asm/kvm_emulate.h
>>>> index 3de11a2..13feed5 100644
>>>> --- a/arch/arm/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm/include/asm/kvm_emulate.h
>>>> @@ -243,6 +243,11 @@ static inline unsigned long 
>>>> vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>>>}
>>>>  }
>>>>  
>>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +  return true;
>>>> +}
>>>> +
>>>>  #ifdef CONFIG_VFPv3
>>>>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd 
>>>> unit */
>>>>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>>>> diff --git a/arch/arm/include/asm/kvm_host.h 
>>>> b/arch/arm/include/asm/kvm_host.h
>>>> index ecc883a..720ae51 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>>>  
>>>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>>> +
>>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>>>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>>>  
>>>>  static inline void kvm_arch_hardware_disable(void) {}
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index 1de07ab..dd59f8a 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
>>>> cpu)
>>>>  
>>>>kvm_arm_set_running_vcpu(vcpu);
>>>>  
>>>> -  /*  Save and enable FPEXC before we load guest context */
>>>> -  kvm_enable_vcpu_fpexc(vcpu);
>>>> +  /*
>>>> +   * For 32bit guest executing on arm64, enable fp/simd access in
>>>> +   * EL2. On arm32 save host fpexc and then enable fp/simd access.
>>>> +   */
>>>> +  if (kvm_guest_vcpu_is_32bit(vcpu))
>>>> +  kvm_enable_vcpu_fpexc(vcpu);
>>>>  
>>>>/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>>>>vcpu_reset_cptr(vcpu);
>>>> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
>>>> cpu)
>>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>>  {
>>>>/* If the fp/simd registers are dirty save guest, restore host. */
>>>> -  if (kvm_vcpu_vfp_isdirty(vcpu))
>>>

Re: [PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-12-18 Thread Mario Smarduch
On 12/18/2015 5:49 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:13PM -0800, Mario Smarduch wrote:
>> This patch tracks armv7 fp/simd hardware state with hcptr register.
>> On vcpu_load saves host fpexc, enables FP access, and sets trapping
>> on fp/simd access. On first fp/simd access trap to handler to save host and 
>> restore guest context, clear trapping bits to enable vcpu lazy mode. On 
>> vcpu_put if trap bits are cleared save guest and restore host context and 
>> always restore host fpexc.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_emulate.h   | 50 
>> 
>>  arch/arm/include/asm/kvm_host.h  |  1 +
>>  arch/arm/kvm/Makefile|  2 +-
>>  arch/arm/kvm/arm.c   | 13 ++
>>  arch/arm/kvm/fpsimd_switch.S | 46 +
>>  arch/arm/kvm/interrupts.S| 32 +--
>>  arch/arm/kvm/interrupts_head.S   | 33 ++--
>>  arch/arm64/include/asm/kvm_emulate.h |  9 +++
>>  arch/arm64/include/asm/kvm_host.h|  1 +
>>  9 files changed, 142 insertions(+), 45 deletions(-)
>>  create mode 100644 arch/arm/kvm/fpsimd_switch.S
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h 
>> b/arch/arm/include/asm/kvm_emulate.h
>> index a9c80a2..3de11a2 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -243,4 +243,54 @@ static inline unsigned long 
>> vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  }
>>  }
>>  
>> +#ifdef CONFIG_VFPv3
>> +/* Called from vcpu_load - save fpexc and enable guest access to fp/simd 
>> unit */
> 
> are you really enabling guest access here or just fiddling with fpexc to
> ensure you trap accesses to hyp ?

That's the end goal, but it is setting the fp enable bit? Your later comment of
combining functions and remove assembler should work.

> 
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +u32 fpexc;
>> +
>> +asm volatile(
>> + "mrc p10, 7, %0, cr8, cr0, 0\n"
>> + "str %0, [%1]\n"
>> + "mov %0, #(1 << 30)\n"
>> + "mcr p10, 7, %0, cr8, cr0, 0\n"
>> + "isb\n"
> 
> why do you need an ISB here?  won't there be an implicit one from the
> HVC call later before you need this to take effect?

I would think so, but besides B.2.7.3  I can't find other references on
visibility of context altering instructions.
> 
>> + : "+r" (fpexc)
>> + : "r" (>arch.host_fpexc)
>> +);
> 
> this whole bit can be rewritten something like:
> 
> fpexc = fmrx(FPEXC);
> vcpu->arch.host_fpexc = fpexc;
> fpexc |= FPEXC_EN;
> fmxr(FPEXC, fpexc);

Didn't know about fmrx/fmxr functions - much better.
> 
>> +}
>> +
>> +/* Called from vcpu_put - restore host fpexc */
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +asm volatile(
>> + "mcr p10, 7, %0, cr8, cr0, 0\n"
>> + :
>> + : "r" (vcpu->arch.host_fpexc)
>> +);
> 
> similarly here
Ok.
> 
>> +}
>> +
>> +/* If trap bits are reset then fp/simd registers are dirty */
>> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +return !!(~vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
> 
> this looks complicated, how about:
> 
> return !(vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));

Yeah, I twisted the meaning of bool.
> 
>> +}
>> +
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>> +{
>> +vcpu->arch.hcptr |= (HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11));
>> +}
>> +#else
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +return false;
>> +}
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>> +{
>> +vcpu->arch.hcptr = HCPTR_TTA;
>> +}
>> +#endif
>> +
>>  #endif /* __ARM_KVM_EMULATE_H__ */
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 09bb1f2..ecc883a 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,7 @@ int kvm_perf_tea

Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-18 Thread Mario Smarduch
On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
>> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 
>> register.
>> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
>> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
>> to save host and restore guest context, and clear trapping bits to enable 
>> vcpu 
>> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
>> context and also save 32 bit guest fpexc register.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
>>  arch/arm/include/asm/kvm_host.h  |   2 +
>>  arch/arm/kvm/arm.c   |  20 +--
>>  arch/arm64/include/asm/kvm_asm.h |   2 +
>>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>>  arch/arm64/include/asm/kvm_host.h|  16 +-
>>  arch/arm64/kernel/asm-offsets.c  |   1 +
>>  arch/arm64/kvm/Makefile  |   3 +-
>>  arch/arm64/kvm/fpsimd_switch.S   |  38 
>>  arch/arm64/kvm/hyp.S | 108 
>> +--
>>  arch/arm64/kvm/hyp_head.S|  48 
>>  11 files changed, 181 insertions(+), 77 deletions(-)
>>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>>  create mode 100644 arch/arm64/kvm/hyp_head.S
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h 
>> b/arch/arm/include/asm/kvm_emulate.h
>> index 3de11a2..13feed5 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -243,6 +243,11 @@ static inline unsigned long 
>> vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  }
>>  }
>>  
>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>> +{
>> +return true;
>> +}
>> +
>>  #ifdef CONFIG_VFPv3
>>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd 
>> unit */
>>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index ecc883a..720ae51 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +
>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 1de07ab..dd59f8a 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  
>>  kvm_arm_set_running_vcpu(vcpu);
>>  
>> -/*  Save and enable FPEXC before we load guest context */
>> -kvm_enable_vcpu_fpexc(vcpu);
>> +/*
>> + * For 32bit guest executing on arm64, enable fp/simd access in
>> + * EL2. On arm32 save host fpexc and then enable fp/simd access.
>> + */
>> +if (kvm_guest_vcpu_is_32bit(vcpu))
>> +kvm_enable_vcpu_fpexc(vcpu);
>>  
>>  /* reset hyp cptr register to trap on tracing and vfp/simd access*/
>>  vcpu_reset_cptr(vcpu);
>> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>>  /* If the fp/simd registers are dirty save guest, restore host. */
>> -if (kvm_vcpu_vfp_isdirty(vcpu))
>> +if (kvm_vcpu_vfp_isdirty(vcpu)) {
>>  kvm_restore_host_vfp_state(vcpu);
>>  
>> -/* Restore host FPEXC trashed in vcpu_load */
>> +/*
>> + * For 32bit guest on arm64 save the guest fpexc register
>> + * in EL2 mode.
>> + */
>> +if (kvm_guest_vcpu_is_32bit(vcpu))
>> +kvm_save_guest_vcpu_fpexc(vcpu);
>> +}
>> +
>> +/* For arm32 restore host FPEXC trashed in vcpu_load. */
>>  kvm_restore_host_fpexc(vcpu);
>>  
>>  /*
>> diff --git a/arch/arm64/include/asm/kvm_asm.h 
>> b/arch/arm64/include/asm/kvm_asm.h
>> index 5e37710..d53d069 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/

Re: [PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support

2015-12-18 Thread Mario Smarduch


On 12/18/2015 5:07 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:12PM -0800, Mario Smarduch wrote:
>> This patch adds vcpu fields to configure hcptr trap register which is also 
>> used 
>> to determine if fp/simd registers are dirty. Adds a field to save host 
>> FPEXC, 
>> and offsets associated offsets.
> 
> offsets offsets?
Should be 'with vcpu fields'
> 
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h | 6 ++
>>  arch/arm/kernel/asm-offsets.c   | 2 ++
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 3df1e97..09bb1f2 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -104,6 +104,12 @@ struct kvm_vcpu_arch {
>>  /* HYP trapping configuration */
>>  u32 hcr;
>>  
>> +/* HYP Co-processor fp/simd and trace trapping configuration */
>> +u32 hcptr;
>> +
>> +/* Save host FPEXC register to later restore on vcpu put */
>> +u32 host_fpexc;
>> +
>>  /* Interrupt related fields */
>>  u32 irq_lines;  /* IRQ and FIQ levels */
>>  
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 871b826..28ebd4c 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -185,6 +185,8 @@ int main(void)
>>DEFINE(VCPU_PC,   offsetof(struct kvm_vcpu, 
>> arch.regs.usr_regs.ARM_pc));
>>DEFINE(VCPU_CPSR, offsetof(struct kvm_vcpu, 
>> arch.regs.usr_regs.ARM_cpsr));
>>DEFINE(VCPU_HCR,  offsetof(struct kvm_vcpu, arch.hcr));
>> +  DEFINE(VCPU_HCPTR,offsetof(struct kvm_vcpu, arch.hcptr));
>> +  DEFINE(VCPU_VFP_HOST_FPEXC,   offsetof(struct kvm_vcpu, 
>> arch.host_fpexc));
> 
> this makes me think this needs a good rebase on world-switch in C, which
> is now in kvmarm/next...
Ok, definitely.
> 
>>DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines));
>>DEFINE(VCPU_HSR,  offsetof(struct kvm_vcpu, arch.fault.hsr));
>>DEFINE(VCPU_HxFAR,offsetof(struct kvm_vcpu, 
>> arch.fault.hxfar));
> 
> this patch is hard to review on its own as I don't see how this is used,
> but ok...
Sure, I'll combine it.
> 
>> -- 
>> 1.9.1
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/22] arm64: KVM: Implement system register save/restore

2015-12-12 Thread Mario Smarduch


On 12/11/2015 10:29 AM, Marc Zyngier wrote:
> Hi Mario,
> 
> On 11/12/15 03:24, Mario Smarduch wrote:
>> Hi Marc,
>>
>> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>>> Implement the system register save/restore as a direct translation of
>>> the assembly code version.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>
>>> Reviewed-by: Christoffer Dall <christoffer.d...@linaro.org>
>>> ---
>>>  arch/arm64/kvm/hyp/Makefile|  1 +
>>>  arch/arm64/kvm/hyp/hyp.h   |  3 ++
>>>  arch/arm64/kvm/hyp/sysreg-sr.c | 90 
>>> ++
>>>  3 files changed, 94 insertions(+)
>>>  create mode 100644 arch/arm64/kvm/hyp/sysreg-sr.c
>>>
>>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>>> index 455dc0a..ec94200 100644
>>> --- a/arch/arm64/kvm/hyp/Makefile
>>> +++ b/arch/arm64/kvm/hyp/Makefile
>>> @@ -5,3 +5,4 @@
>>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>>> +obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>>> index f213e46..778d56d 100644
>>> --- a/arch/arm64/kvm/hyp/hyp.h
>>> +++ b/arch/arm64/kvm/hyp/hyp.h
>>> @@ -38,5 +38,8 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>>  void __timer_save_state(struct kvm_vcpu *vcpu);
>>>  void __timer_restore_state(struct kvm_vcpu *vcpu);
>>>  
>>> +void __sysreg_save_state(struct kvm_cpu_context *ctxt);
>>> +void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
>>> +
>>>  #endif /* __ARM64_KVM_HYP_H__ */
>>>  
>>> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
>>> new file mode 100644
>>> index 000..add8fcb
>>> --- /dev/null
>>> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
>>> @@ -0,0 +1,90 @@
>>> +/*
>>> + * Copyright (C) 2012-2015 - ARM Ltd
>>> + * Author: Marc Zyngier <marc.zyng...@arm.com>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +
>>> +#include "hyp.h"
>>> +
>>
>> I looked closer on some other ways to get better performance out of
>> the compiler. This code sequence performs about 35% faster for 
>> __sysreg_save_state(..) for 5000 exits you save about 500mS or 100nS
>> per exit. This is on Juno.
> 
> 35% faster? Really? That's pretty crazy. Was that on the A57 or the A53?

Good question, I bind kvmtool to cpu1, I think that's an A57.
> 
>>
>> register int volatile count asm("r2") = 0;

I meant x2, but this compiles with aarch64 compiler and runs on Juno. Appears
like compiler may have an issue.

> 
> Does this even work on arm64? We don't have an "r2" register...
> 
>>
>> do {
>> 
>> } while(count);
>>
>> I didn't test the restore function (ran out of time) but I suspect it should 
>> be
>> the same. The assembler pretty much uses all the GPRs, (a little too many, 
>> using
>> stp to push 4 pairs on the stack and restore) looking at the assembler it all
>> should execute out of order.
> 
> Are you talking about the original implementation here? or the generated
> code out of the compiler? The original implementation didn't push
> anything on the stack (apart from the prologue, but we have the same
> thing in the C implementation).

This is generated compiler code using the do { ... } while code.
> 
> Looking at the compiler output, we have a bunch of mrs/str, one after
> the other - pretty basic. Maybe that gives the CPU some "breathing"
> time, but I have no idea if that's more or less efficient.
> 
> But the main thing 

Re: [PATCH v3 07/22] arm64: KVM: Implement system register save/restore

2015-12-10 Thread Mario Smarduch
Hi Marc,

On 12/7/2015 2:53 AM, Marc Zyngier wrote:
> Implement the system register save/restore as a direct translation of
> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> Reviewed-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/hyp/Makefile|  1 +
>  arch/arm64/kvm/hyp/hyp.h   |  3 ++
>  arch/arm64/kvm/hyp/sysreg-sr.c | 90 
> ++
>  3 files changed, 94 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/sysreg-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 455dc0a..ec94200 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -5,3 +5,4 @@
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index f213e46..778d56d 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -38,5 +38,8 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>  void __timer_save_state(struct kvm_vcpu *vcpu);
>  void __timer_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __sysreg_save_state(struct kvm_cpu_context *ctxt);
> +void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> new file mode 100644
> index 000..add8fcb
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -0,0 +1,90 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +

I looked closer on some other ways to get better performance out of the
compiler. This code sequence performs about 35% faster for
__sysreg_save_state(..) for 5000 exits you save about 500mS or 100nS per exit.
This is on Juno.

register int volatile count asm("r2") = 0;

do {

} while(count);

I didn't test the restore function (ran out of time) but I suspect it should be
the same. The assembler pretty much uses all the GPRs, (a little too many, using
stp to push 4 pairs on the stack and restore) looking at the assembler it all
should execute out of order.

FWIW I gave this a try since compilers like to optimize loops. I used
'cntpct_el0' counter register to measure the intervals.


> +/* ctxt is already in the HYP VA space */
> +void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
> +{
> + ctxt->sys_regs[MPIDR_EL1]   = read_sysreg(vmpidr_el2);
> + ctxt->sys_regs[CSSELR_EL1]  = read_sysreg(csselr_el1);
> + ctxt->sys_regs[SCTLR_EL1]   = read_sysreg(sctlr_el1);
> + ctxt->sys_regs[ACTLR_EL1]   = read_sysreg(actlr_el1);
> + ctxt->sys_regs[CPACR_EL1]   = read_sysreg(cpacr_el1);
> + ctxt->sys_regs[TTBR0_EL1]   = read_sysreg(ttbr0_el1);
> + ctxt->sys_regs[TTBR1_EL1]   = read_sysreg(ttbr1_el1);
> + ctxt->sys_regs[TCR_EL1] = read_sysreg(tcr_el1);
> + ctxt->sys_regs[ESR_EL1] = read_sysreg(esr_el1);
> + ctxt->sys_regs[AFSR0_EL1]   = read_sysreg(afsr0_el1);
> + ctxt->sys_regs[AFSR1_EL1]   = read_sysreg(afsr1_el1);
> + ctxt->sys_regs[FAR_EL1] = read_sysreg(far_el1);
> + ctxt->sys_regs[MAIR_EL1]= read_sysreg(mair_el1);
> + ctxt->sys_regs[VBAR_EL1]= read_sysreg(vbar_el1);
> + ctxt->sys_regs[CONTEXTIDR_EL1]  = read_sysreg(contextidr_el1);
> + ctxt->sys_regs[TPIDR_EL0]   = read_sysreg(tpidr_el0);
> + ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
> + ctxt->sys_regs[TPIDR_EL1]   = read_sysreg(tpidr_el1);
> + ctxt->sys_regs[AMAIR_EL1]   = read_sysreg(amair_el1);
> + ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
> + ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> + ctxt->sys_regs[MDSCR_EL1]   = read_sysreg(mdscr_el1);
> +
> + ctxt->gp_regs.regs.sp   = read_sysreg(sp_el0);
> + ctxt->gp_regs.regs.pc   = read_sysreg(elr_el2);
> + ctxt->gp_regs.regs.pstate   = read_sysreg(spsr_el2);
> + ctxt->gp_regs.sp_el1= read_sysreg(sp_el1);
> + ctxt->gp_regs.elr_el1   = 

Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Mario Smarduch


On 12/7/2015 8:52 AM, Marc Zyngier wrote:
> Hi Mario,
> 
> On 07/12/15 16:40, Mario Smarduch wrote:
>> Hi Marc,
>>
>> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>>> Implement the vgic-v3 save restore as a direct translation of
>>> the assembly code version.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>
>>> ---
>>>  arch/arm64/kvm/hyp/Makefile |   1 +
>>>  arch/arm64/kvm/hyp/hyp.h|   3 +
>>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
>>> 
>>>  3 files changed, 230 insertions(+)
>>>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>>>
>>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>>> index d8d5968..d1e38ce 100644
>>> --- a/arch/arm64/kvm/hyp/Makefile
>>> +++ b/arch/arm64/kvm/hyp/Makefile
>>> @@ -3,3 +3,4 @@
>>>  #
>>>  
>>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>>> index ac63553..5759f9f 100644
>>> --- a/arch/arm64/kvm/hyp/hyp.h
>>> +++ b/arch/arm64/kvm/hyp/hyp.h
>>> @@ -32,5 +32,8 @@
>>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>>  
>>> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>>> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>> +
>>>  #endif /* __ARM64_KVM_HYP_H__ */
>>>  
>>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
>>> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>>> new file mode 100644
>>> index 000..78d05f3
>>> --- /dev/null
>>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>>> @@ -0,0 +1,226 @@
>>> +/*
>>> + * Copyright (C) 2012-2015 - ARM Ltd
>>> + * Author: Marc Zyngier <marc.zyng...@arm.com>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +
>>> +#include "hyp.h"
>>> +
>>> +#define vtr_to_max_lr_idx(v)   ((v) & 0xf)
>>> +#define vtr_to_nr_pri_bits(v)  (((u32)(v) >> 29) + 1)
>>> +
>>> +#define read_gicreg(r) 
>>> \
>>> +   ({  \
>>> +   u64 reg;\
>>> +   asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
>>> +   reg;\
>>> +   })
>>> +
>>> +#define write_gicreg(v,r)  \
>>> +   do {\
>>> +   u64 __val = (v);\
>>> +   asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
>>> +   } while (0)
>>> +
>>> +/* vcpu is already in the HYP VA space */
>>> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>>> +{
>>> +   struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
>>> +   u64 val;
>>> +   u32 max_lr_idx, nr_pri_bits;
>>> +
>>> +   /*
>>> +* Make sure stores to the GIC via the memory mapped interface
>>> +* are now visible to the system register interface.
>>> +*/
>>> +   dsb(st);
>>> +
>>> +   cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
>>> +   cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>>> +   cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>>> +   cpu_if->vgi

Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Mario Smarduch
Hi Marc,

On 12/7/2015 2:53 AM, Marc Zyngier wrote:
> Implement the vgic-v3 save restore as a direct translation of
> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile |   1 +
>  arch/arm64/kvm/hyp/hyp.h|   3 +
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
> 
>  3 files changed, 230 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index d8d5968..d1e38ce 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -3,3 +3,4 @@
>  #
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index ac63553..5759f9f 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -32,5 +32,8 @@
>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> new file mode 100644
> index 000..78d05f3
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -0,0 +1,226 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +#define vtr_to_max_lr_idx(v) ((v) & 0xf)
> +#define vtr_to_nr_pri_bits(v)(((u32)(v) >> 29) + 1)
> +
> +#define read_gicreg(r)   
> \
> + ({  \
> + u64 reg;\
> + asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
> + reg;\
> + })
> +
> +#define write_gicreg(v,r)\
> + do {\
> + u64 __val = (v);\
> + asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
> + } while (0)
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
> + u64 val;
> + u32 max_lr_idx, nr_pri_bits;
> +
> + /*
> +  * Make sure stores to the GIC via the memory mapped interface
> +  * are now visible to the system register interface.
> +  */
> + dsb(st);
> +
> + cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> + cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> + cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> + cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
> +
> + write_gicreg(0, ICH_HCR_EL2);
> + val = read_gicreg(ICH_VTR_EL2);
> + max_lr_idx = vtr_to_max_lr_idx(val);
> + nr_pri_bits = vtr_to_nr_pri_bits(val);
> +
Can you setup a base pointer to cpu_if->vgic_lr and use an offset?

Also is there a way to get rid of the constants, that implicitly hard codes max
number of LRs, doesn't make the code portable.

> + switch (max_lr_idx) {
> + case 15:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = 
> read_gicreg(ICH_LR15_EL2);
> + case 14:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = 
> read_gicreg(ICH_LR14_EL2);
> + case 13:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = 
> read_gicreg(ICH_LR13_EL2);
> + case 12:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = 
> read_gicreg(ICH_LR12_EL2);
> + case 11:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = 
> read_gicreg(ICH_LR11_EL2);
> + case 10:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = 
> read_gicreg(ICH_LR10_EL2);
> + case 9:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2);
> + case 8:
> + cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2);
> + case 7:
> + 

Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Mario Smarduch


On 12/7/2015 10:20 AM, Marc Zyngier wrote:
> On 07/12/15 18:05, Mario Smarduch wrote:
>>
>>
>> On 12/7/2015 9:37 AM, Marc Zyngier wrote:
[...]
>>>
>>
>> I was thinking something like 'current_lr[VGIC_V3_LR_INDEX(...)]'.
> 
> That doesn't change anything, the compiler is perfectly able to 
> optimize something like this:
> 
> [...]
> ffc0007f31ac:   38624862ldrbw2, [x3,w2,uxtw]
> ffc0007f31b0:   1063adr x3, ffc0007f31bc 
> <__vgic_v3_save_state+0x64>
> ffc0007f31b4:   8b228862add x2, x3, w2, sxtb #2
> ffc0007f31b8:   d61f0040br  x2
> ffc0007f31bc:   d53ccde2mrs x2, s3_4_c12_c13_7
> ffc0007f31c0:   f9001c02str x2, [x0,#56]
> ffc0007f31c4:   d53ccdc2mrs x2, s3_4_c12_c13_6
> ffc0007f31c8:   f9002002str x2, [x0,#64]
> ffc0007f31cc:   d53ccda2mrs x2, s3_4_c12_c13_5
> ffc0007f31d0:   f9002402str x2, [x0,#72]
> ffc0007f31d4:   d53ccd82mrs x2, s3_4_c12_c13_4
> ffc0007f31d8:   f9002802str x2, [x0,#80]
> ffc0007f31dc:   d53ccd62mrs x2, s3_4_c12_c13_3
> ffc0007f31e0:   f9002c02str x2, [x0,#88]
> ffc0007f31e4:   d53ccd42mrs x2, s3_4_c12_c13_2
> ffc0007f31e8:   f9003002str x2, [x0,#96]
> ffc0007f31ec:   d53ccd22mrs x2, s3_4_c12_c13_1
> ffc0007f31f0:   f9003402str x2, [x0,#104]
> ffc0007f31f4:   d53ccd02mrs x2, s3_4_c12_c13_0
> ffc0007f31f8:   f9003802str x2, [x0,#112]
> ffc0007f31fc:   d53ccce2mrs x2, s3_4_c12_c12_7
> ffc0007f3200:   f9003c02str x2, [x0,#120]
> ffc0007f3204:   d532mrs x2, s3_4_c12_c12_6
> ffc0007f3208:   f9004002str x2, [x0,#128]
> ffc0007f320c:   d53ccca2mrs x2, s3_4_c12_c12_5
> ffc0007f3210:   f9004402str x2, [x0,#136]
> ffc0007f3214:   d53ccc82mrs x2, s3_4_c12_c12_4
> ffc0007f3218:   f9004802str x2, [x0,#144]
> ffc0007f321c:   d53ccc62mrs x2, s3_4_c12_c12_3
> ffc0007f3220:   f9004c02str x2, [x0,#152]
> ffc0007f3224:   d53ccc42mrs x2, s3_4_c12_c12_2
> ffc0007f3228:   f9005002str x2, [x0,#160]
> ffc0007f322c:   d53ccc22mrs x2, s3_4_c12_c12_1
> ffc0007f3230:   f9005402str x2, [x0,#168]
> ffc0007f3234:   d53ccc02mrs x2, s3_4_c12_c12_0
> ffc0007f3238:   7100183fcmp w1, #0x6
> ffc0007f323c:   f9005802str x2, [x0,#176]
> 
> As you can see, this is as optimal as it gets, short of being able
> to find a nice way to use more than one register...

Interesting, thanks for the dump I'm no expert on pipeline optimizations but I'm
wondering with these system register accesses can these be executed out of order
provided you didn't have what I thinks are write after read dependencies?
It's only 4 registers here, there are some other longer stretches in subsequent
patches.

I minor note here is some white space in this patch.
> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/22] arm64: KVM: Implement timer save/restore

2015-12-07 Thread Mario Smarduch


On 12/7/2015 2:53 AM, Marc Zyngier wrote:
> Implement the timer save restore as a direct translation of
> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile  |  1 +
>  arch/arm64/kvm/hyp/hyp.h |  3 ++
>  arch/arm64/kvm/hyp/timer-sr.c| 72 
> 
>  include/clocksource/arm_arch_timer.h |  6 +++
>  4 files changed, 82 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/timer-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index d1e38ce..455dc0a 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -4,3 +4,4 @@
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 5759f9f..f213e46 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -35,5 +35,8 @@ void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>  void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __timer_save_state(struct kvm_vcpu *vcpu);
> +void __timer_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/timer-sr.c b/arch/arm64/kvm/hyp/timer-sr.c
> new file mode 100644
> index 000..67292c0
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/timer-sr.c
> @@ -0,0 +1,72 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + struct arch_timer_cpu *timer = >arch.timer_cpu;
> + u64 val;
> +
> + if (kvm->arch.timer.enabled) {
> + timer->cntv_ctl = read_sysreg(cntv_ctl_el0);
> + isb();

Can you share the subtle insight why is the isb() needed here?
B2.7.3 mentions changes to system registers only.

> + timer->cntv_cval = read_sysreg(cntv_cval_el0);
> + }
> +
> + /* Disable the virtual timer */
> + write_sysreg(0, cntv_ctl_el0);
> +
> + /* Allow physical timer/counter access for the host */
> + val = read_sysreg(cnthctl_el2);
> + val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> + write_sysreg(val, cnthctl_el2);
> +
> + /* Clear cntvoff for the host */
> + write_sysreg(0, cntvoff_el2);
> +}
> +
> +void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + struct arch_timer_cpu *timer = >arch.timer_cpu;
> + u64 val;
> +
> + /*
> +  * Disallow physical timer access for the guest
> +  * Physical counter access is allowed
> +  */
> + val = read_sysreg(cnthctl_el2);
> + val &= ~CNTHCTL_EL1PCEN;
> + val |= CNTHCTL_EL1PCTEN;
> + write_sysreg(val, cnthctl_el2);
> +
> + if (kvm->arch.timer.enabled) {
> + write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2);
> + write_sysreg(timer->cntv_cval, cntv_cval_el0);
> + isb();
> + write_sysreg(timer->cntv_ctl, cntv_ctl_el0);
> + }
> +}
> diff --git a/include/clocksource/arm_arch_timer.h 
> b/include/clocksource/arm_arch_timer.h
> index 9916d0e..25d0914 100644
> --- a/include/clocksource/arm_arch_timer.h
> +++ b/include/clocksource/arm_arch_timer.h
> @@ -23,6 +23,12 @@
>  #define ARCH_TIMER_CTRL_IT_MASK  (1 << 1)
>  #define ARCH_TIMER_CTRL_IT_STAT  (1 << 2)
>  
> +#define CNTHCTL_EL1PCTEN (1 << 0)
> +#define CNTHCTL_EL1PCEN  (1 << 1)
> +#define CNTHCTL_EVNTEN   (1 << 2)
> +#define CNTHCTL_EVNTDIR  (1 << 3)
> +#define CNTHCTL_EVNTI(0xF << 4)
> +
>  enum arch_timer_reg {
>   ARCH_TIMER_REG_CTRL,
>   ARCH_TIMER_REG_TVAL,
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Mario Smarduch


On 12/7/2015 9:37 AM, Marc Zyngier wrote:
> On 07/12/15 17:18, Mario Smarduch wrote:
>>
>>
>> On 12/7/2015 8:52 AM, Marc Zyngier wrote:
>>> Hi Mario,
>>>
>>> On 07/12/15 16:40, Mario Smarduch wrote:
>>>> Hi Marc,
>>>>
>>>> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>>>>> Implement the vgic-v3 save restore as a direct translation of
>>>>> the assembly code version.
>>>>>
>>>>> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>
>>>>> ---
>>>>>  arch/arm64/kvm/hyp/Makefile |   1 +
>>>>>  arch/arm64/kvm/hyp/hyp.h|   3 +
>>>>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
>>>>> 
>>>>>  3 files changed, 230 insertions(+)
>>>>>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>>>>>
>>>>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>>>>> index d8d5968..d1e38ce 100644
>>>>> --- a/arch/arm64/kvm/hyp/Makefile
>>>>> +++ b/arch/arm64/kvm/hyp/Makefile
>>>>> @@ -3,3 +3,4 @@
>>>>>  #
>>>>>  
>>>>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>>>> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>>>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>>>>> index ac63553..5759f9f 100644
>>>>> --- a/arch/arm64/kvm/hyp/hyp.h
>>>>> +++ b/arch/arm64/kvm/hyp/hyp.h
>>>>> @@ -32,5 +32,8 @@
>>>>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>>>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>>>>  
>>>>> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>>>>> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>>>> +
>>>>>  #endif /* __ARM64_KVM_HYP_H__ */
>>>>>  
>>>>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
>>>>> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>>>>> new file mode 100644
>>>>> index 000..78d05f3
>>>>> --- /dev/null
>>>>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>>>>> @@ -0,0 +1,226 @@
>>>>> +/*
>>>>> + * Copyright (C) 2012-2015 - ARM Ltd
>>>>> + * Author: Marc Zyngier <marc.zyng...@arm.com>
>>>>> + *
>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>> + * it under the terms of the GNU General Public License version 2 as
>>>>> + * published by the Free Software Foundation.
>>>>> + *
>>>>> + * This program is distributed in the hope that it will be useful,
>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>> + * GNU General Public License for more details.
>>>>> + *
>>>>> + * You should have received a copy of the GNU General Public License
>>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>>> + */
>>>>> +
>>>>> +#include 
>>>>> +#include 
>>>>> +#include 
>>>>> +
>>>>> +#include 
>>>>> +
>>>>> +#include "hyp.h"
>>>>> +
>>>>> +#define vtr_to_max_lr_idx(v) ((v) & 0xf)
>>>>> +#define vtr_to_nr_pri_bits(v)(((u32)(v) >> 29) + 1)
>>>>> +
>>>>> +#define read_gicreg(r)   
>>>>> \
>>>>> + ({  \
>>>>> + u64 reg;\
>>>>> + asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
>>>>> + reg;\
>>>>> + })
>>>>> +
>>>>> +#define write_gicreg(v,r)
>>>>> \
>>>>> + do {\
>>>>> + u64 __val = (v);\
>>>>> + asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
&

[PATCH v5 0/3] KVM/arm/arm64: enhance armv7/8 fp/simd lazy switch

2015-12-06 Thread Mario Smarduch
This patch series combines the previous armv7 and armv8 versions.
For an FP and lmbench load it reduces fp/simd context switch from 30-50% down
to near 0%. Results will vary with load but is no worse then current
approach.

In summary current lazy vfp/simd implementation switches hardware context only
on guest access and again on exit to host, otherwise hardware context is
skipped. This patch set builds on that functionality and executes a hardware
context switch only when  vCPU is scheduled out or returns to user space.

Running floating point app on nearly idle system:
./tst-float 10uS - (sleep for .1s) fp/simd switch reduced by 99%+
./tst-float 1uS -  (sleep for .01s)   reduced by 98%+
./tst-float 1000uS -   (sleep for 1ms)reduced by ~98%
...
./tst-float 1uS - reduced by  2%+

Tested on FastModels and Foundation Model (need to test on Juno)

Tests Ran:
--
armv7 - with CONFIG_VFP, CONFIG_NEON, CONFIG_KERNEL_MODE_NEON options enabled:

- On host executed 12 fp applications - evenly pinned to cpus
- Two guests - with 12 fp processes - also pinned to vpus.
- Executing with various sleep intervals to measure ration between exits
  and fp/simd switch

armv8:
-  added mix of armv7 and armv8 guests.

These patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

Chances since v4->v5:
- Followed up on Marcs comments
  - Removed dirty flag, and used trap bits to check for dirty fp/simd
  - Seperated host form hyp code
  - As a consequence for arm64 added a commend assember header file
  - Fixed up critical accesses to fpexec, and added isb
  - Converted defines to inline functions

Changes since v3->v4:
- Followup on Christoffers comments 
  - Move fpexc handling to vcpu_load and vcpu_put
  - Enable and restore fpexc in EL2 mode when running a 32 bit guest on
64bit EL2
  - rework hcptr handling

Changes since v2->v3:
- combined arm v7 and v8 into one short patch series
- moved access to fpexec_el2 back to EL2
- Move host restore to EL1 from EL2 and call directly from host
- optimize trap enable code 
- renamed some variables to match usage

Changes since v1->v2:
- Fixed vfp/simd trap configuration to enable trace trapping
- Removed set_hcptr branch label
- Fixed handling of FPEXC to restore guest and host versions on vcpu_put
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit


Mario Smarduch (3):
  add hooks for armv7 fp/simd lazy switch support
  enable enhanced armv7 fp/simd lazy switch
  enable enhanced armv8 fp/simd lazy switch

 arch/arm/include/asm/kvm_emulate.h   |  55 ++
 arch/arm/include/asm/kvm_host.h  |   9 +++
 arch/arm/kernel/asm-offsets.c|   2 +
 arch/arm/kvm/Makefile|   2 +-
 arch/arm/kvm/arm.c   |  25 
 arch/arm/kvm/fpsimd_switch.S |  46 +++
 arch/arm/kvm/interrupts.S|  32 +++
 arch/arm/kvm/interrupts_head.S   |  33 +--
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/kvm_emulate.h |  16 ++
 arch/arm64/include/asm/kvm_host.h|  15 +
 arch/arm64/kernel/asm-offsets.c  |   1 +
 arch/arm64/kvm/Makefile  |   3 +-
 arch/arm64/kvm/fpsimd_switch.S   |  38 
 arch/arm64/kvm/hyp.S | 108 +--
 arch/arm64/kvm/hyp_head.S|  48 
 16 files changed, 322 insertions(+), 113 deletions(-)
 create mode 100644 arch/arm/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/hyp_head.S

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-12-06 Thread Mario Smarduch
This patch tracks armv7 fp/simd hardware state with hcptr register.
On vcpu_load saves host fpexc, enables FP access, and sets trapping
on fp/simd access. On first fp/simd access trap to handler to save host and 
restore guest context, clear trapping bits to enable vcpu lazy mode. On 
vcpu_put if trap bits are cleared save guest and restore host context and 
always restore host fpexc.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_emulate.h   | 50 
 arch/arm/include/asm/kvm_host.h  |  1 +
 arch/arm/kvm/Makefile|  2 +-
 arch/arm/kvm/arm.c   | 13 ++
 arch/arm/kvm/fpsimd_switch.S | 46 +
 arch/arm/kvm/interrupts.S| 32 +--
 arch/arm/kvm/interrupts_head.S   | 33 ++--
 arch/arm64/include/asm/kvm_emulate.h |  9 +++
 arch/arm64/include/asm/kvm_host.h|  1 +
 9 files changed, 142 insertions(+), 45 deletions(-)
 create mode 100644 arch/arm/kvm/fpsimd_switch.S

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index a9c80a2..3de11a2 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -243,4 +243,54 @@ static inline unsigned long vcpu_data_host_to_guest(struct 
kvm_vcpu *vcpu,
}
 }
 
+#ifdef CONFIG_VFPv3
+/* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit 
*/
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
+{
+   u32 fpexc;
+
+   asm volatile(
+"mrc p10, 7, %0, cr8, cr0, 0\n"
+"str %0, [%1]\n"
+"mov %0, #(1 << 30)\n"
+"mcr p10, 7, %0, cr8, cr0, 0\n"
+"isb\n"
+: "+r" (fpexc)
+: "r" (>arch.host_fpexc)
+   );
+}
+
+/* Called from vcpu_put - restore host fpexc */
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu)
+{
+   asm volatile(
+"mcr p10, 7, %0, cr8, cr0, 0\n"
+:
+: "r" (vcpu->arch.host_fpexc)
+   );
+}
+
+/* If trap bits are reset then fp/simd registers are dirty */
+static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+   return !!(~vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
+}
+
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.hcptr |= (HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11));
+}
+#else
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+   return false;
+}
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.hcptr = HCPTR_TTA;
+}
+#endif
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 09bb1f2..ecc883a 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index c5eef02c..411b3e4 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o $(KVM)/vf
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
+obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o fpsimd_switch.o
 obj-y += $(KVM)/arm/vgic.o
 obj-y += $(KVM)/arm/vgic-v2.o
 obj-y += $(KVM)/arm/vgic-v2-emul.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..1de07ab 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -291,10 +291,23 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
kvm_arm_set_running_vcpu(vcpu);
+
+   /*  Save and enable FPEXC before we load guest context */
+   kvm_enable_vcpu_fpexc(vcpu);
+
+   /* reset hyp cptr register to trap on tracing and vfp/simd access*/
+   vcpu_reset_cptr(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   /* If the fp/simd registers are dirty save guest, restore host. */
+   if (kvm_vcpu_vfp_isdirty(vcpu))
+   kvm_restore_host_vfp_state(vcpu);
+
+   /* Restore host FPEXC trashed in vcpu_load */
+   kvm_restore_host_fpexc(vcpu);
+
/*
 * The arch-generic KVM code expects the cpu fie

[PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support

2015-12-06 Thread Mario Smarduch
This patch adds vcpu fields to configure hcptr trap register which is also used 
to determine if fp/simd registers are dirty. Adds a field to save host FPEXC, 
and offsets associated offsets.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_host.h | 6 ++
 arch/arm/kernel/asm-offsets.c   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3df1e97..09bb1f2 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -104,6 +104,12 @@ struct kvm_vcpu_arch {
/* HYP trapping configuration */
u32 hcr;
 
+   /* HYP Co-processor fp/simd and trace trapping configuration */
+   u32 hcptr;
+
+   /* Save host FPEXC register to later restore on vcpu put */
+   u32 host_fpexc;
+
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..28ebd4c 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -185,6 +185,8 @@ int main(void)
   DEFINE(VCPU_PC,  offsetof(struct kvm_vcpu, 
arch.regs.usr_regs.ARM_pc));
   DEFINE(VCPU_CPSR,offsetof(struct kvm_vcpu, 
arch.regs.usr_regs.ARM_cpsr));
   DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr));
+  DEFINE(VCPU_HCPTR,   offsetof(struct kvm_vcpu, arch.hcptr));
+  DEFINE(VCPU_VFP_HOST_FPEXC,  offsetof(struct kvm_vcpu, arch.host_fpexc));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
   DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr));
   DEFINE(VCPU_HxFAR,   offsetof(struct kvm_vcpu, arch.fault.hxfar));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-06 Thread Mario Smarduch
This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
to save host and restore guest context, and clear trapping bits to enable vcpu 
lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
context and also save 32 bit guest fpexc register.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_emulate.h   |   5 ++
 arch/arm/include/asm/kvm_host.h  |   2 +
 arch/arm/kvm/arm.c   |  20 +--
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/kvm_emulate.h |  15 +++--
 arch/arm64/include/asm/kvm_host.h|  16 +-
 arch/arm64/kernel/asm-offsets.c  |   1 +
 arch/arm64/kvm/Makefile  |   3 +-
 arch/arm64/kvm/fpsimd_switch.S   |  38 
 arch/arm64/kvm/hyp.S | 108 +--
 arch/arm64/kvm/hyp_head.S|  48 
 11 files changed, 181 insertions(+), 77 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/hyp_head.S

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index 3de11a2..13feed5 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct 
kvm_vcpu *vcpu,
}
 }
 
+static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
+{
+   return true;
+}
+
 #ifdef CONFIG_VFPv3
 /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit 
*/
 static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ecc883a..720ae51 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+
+static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
 void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 1de07ab..dd59f8a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
kvm_arm_set_running_vcpu(vcpu);
 
-   /*  Save and enable FPEXC before we load guest context */
-   kvm_enable_vcpu_fpexc(vcpu);
+   /*
+* For 32bit guest executing on arm64, enable fp/simd access in
+* EL2. On arm32 save host fpexc and then enable fp/simd access.
+*/
+   if (kvm_guest_vcpu_is_32bit(vcpu))
+   kvm_enable_vcpu_fpexc(vcpu);
 
/* reset hyp cptr register to trap on tracing and vfp/simd access*/
vcpu_reset_cptr(vcpu);
@@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
/* If the fp/simd registers are dirty save guest, restore host. */
-   if (kvm_vcpu_vfp_isdirty(vcpu))
+   if (kvm_vcpu_vfp_isdirty(vcpu)) {
kvm_restore_host_vfp_state(vcpu);
 
-   /* Restore host FPEXC trashed in vcpu_load */
+   /*
+* For 32bit guest on arm64 save the guest fpexc register
+* in EL2 mode.
+*/
+   if (kvm_guest_vcpu_is_32bit(vcpu))
+   kvm_save_guest_vcpu_fpexc(vcpu);
+   }
+
+   /* For arm32 restore host FPEXC trashed in vcpu_load. */
kvm_restore_host_fpexc(vcpu);
 
/*
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e37710..d53d069 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_vcpu_enable_fpexc32(void);
+extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 8dccbd7..bbbee9d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -290,13 +290,20 @@ static inline unsigned long 
vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
return data;/* Leave LE untouched */
 }
 
-static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
-static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
-static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
+

Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support

2015-12-03 Thread Mario Smarduch


On 12/3/2015 7:46 AM, Marc Zyngier wrote:
> On 14/11/15 22:12, Mario Smarduch wrote:
>> This patch adds vcpu fields to track lazy state, save host FPEXC, and
>> offsets to fields.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h | 6 ++
>>  arch/arm/kernel/asm-offsets.c   | 2 ++
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 3df1e97..f1bf551 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
>>  /* Interrupt related fields */
>>  u32 irq_lines;  /* IRQ and FIQ levels */
>>  
>> +/* fp/simd dirty flag true if guest accessed register file */
>> +boolvfp_dirty;
> 
> I think we do not need this bool, because it is already represented by
> the state of the trapping bits.

The trapping bit state is lost on exit since they're cleared, no?

> 
>> +
>> +/* Save host FPEXC register to later restore on vcpu put */
>> +u32 host_fpexc;
>> +
>>  /* Exception Information */
>>  struct kvm_vcpu_fault_info fault;
>>  
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 871b826..9f79712 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -186,6 +186,8 @@ int main(void)
>>DEFINE(VCPU_CPSR, offsetof(struct kvm_vcpu, 
>> arch.regs.usr_regs.ARM_cpsr));
>>DEFINE(VCPU_HCR,  offsetof(struct kvm_vcpu, arch.hcr));
>>DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_VFP_DIRTY,offsetof(struct kvm_vcpu, arch.vfp_dirty));
>> +  DEFINE(VCPU_VFP_HOST_FPEXC,   offsetof(struct kvm_vcpu, 
>> arch.host_fpexc));
>>DEFINE(VCPU_HSR,  offsetof(struct kvm_vcpu, arch.fault.hsr));
>>DEFINE(VCPU_HxFAR,offsetof(struct kvm_vcpu, 
>> arch.fault.hxfar));
>>DEFINE(VCPU_HPFAR,offsetof(struct kvm_vcpu, 
>> arch.fault.hpfar));
>>
> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support

2015-12-03 Thread Mario Smarduch


On 12/3/2015 11:24 AM, Marc Zyngier wrote:
> On 03/12/15 19:21, Mario Smarduch wrote:
>>
>>
>> On 12/3/2015 7:46 AM, Marc Zyngier wrote:
>>> On 14/11/15 22:12, Mario Smarduch wrote:
>>>> This patch adds vcpu fields to track lazy state, save host FPEXC, and
>>>> offsets to fields.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_host.h | 6 ++
>>>>  arch/arm/kernel/asm-offsets.c   | 2 ++
>>>>  2 files changed, 8 insertions(+)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_host.h 
>>>> b/arch/arm/include/asm/kvm_host.h
>>>> index 3df1e97..f1bf551 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
>>>>/* Interrupt related fields */
>>>>u32 irq_lines;  /* IRQ and FIQ levels */
>>>>  
>>>> +  /* fp/simd dirty flag true if guest accessed register file */
>>>> +  boolvfp_dirty;
>>>
>>> I think we do not need this bool, because it is already represented by
>>> the state of the trapping bits.
>>
>> The trapping bit state is lost on exit since they're cleared, no?
> 
> But that's what should actually be preserved, no? At the moment, you
> maintain some side state to reflect what the trapping state is. You
> might as well keep it around all the time.

Ok I see, you should be able to preserve and use the trap registers. I'll rework
it.

> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-03 Thread Mario Smarduch


On 12/3/2015 8:13 AM, Marc Zyngier wrote:
> On 14/11/15 22:12, Mario Smarduch wrote:
>> This patch tracks armv7 and armv8 fp/simd hardware state with a vcpu lazy 
>> flag.
>> On vcpu_load for 32 bit guests enable FP access, and later enable fp/simd
>> trapping for 32 and 64 bit guests if lazy flag is not set. On first fp/simd 
>> access trap to handler to save host and restore guest context, disable 
>> trapping and set vcpu lazy flag. On vcpu_put if flag is set save guest and 
>> restore host context and also save guest fpexc register.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h   |  3 ++
>>  arch/arm/kvm/arm.c| 18 +++--
>>  arch/arm64/include/asm/kvm_asm.h  |  2 +
>>  arch/arm64/include/asm/kvm_host.h | 17 +++-
>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>  arch/arm64/kvm/hyp.S  | 83 
>> +--
>>  6 files changed, 89 insertions(+), 35 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 8fc7a59..6960ff2 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -40,6 +40,8 @@
>>  
>>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>>  
>> +#define kvm_guest_is32bit(vcpu) true
> 
> This should be defined as an inline function, and placed in
> asm/kvm_emulate.h, probably renamed as kvm_guest_vcpu_is_32bit.

Will do, this header file should also resolve my problems in armv7.
> 
>> +
>>  /*
>>   * Reads the host FPEXC register, saves to vcpu context and enables the
>>   * FPEXC.
>> @@ -260,6 +262,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>> +static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu) {}
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index cfc348a..7a20530 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  
>>  kvm_arm_set_running_vcpu(vcpu);
>>  
>> -/* Save and enable FPEXC before we load guest context */
>> -kvm_enable_fpexc(vcpu);
>> +/*
>> + * For 32bit guest executing on arm64, enable fp/simd access in
>> + * EL2. On arm32 save host fpexc and then enable fp/simd access.
>> + */
>> +if (kvm_guest_is32bit(vcpu))
>> +kvm_enable_fpexc(vcpu);
>>  }
>>  
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -301,10 +305,18 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  /* If the fp/simd registers are dirty save guest, restore host. */
>>  if (vcpu->arch.vfp_dirty) {
>>  kvm_restore_host_vfp_state(vcpu);
>> +
>> +/*
>> + * For 32bit guest on arm64 save the guest fpexc register
>> + * in EL2 mode.
>> + */
>> +if (kvm_guest_is32bit(vcpu))
>> +kvm_save_guest_fpexc(vcpu);
>> +
>>  vcpu->arch.vfp_dirty = 0;
>>  }
>>  
>> -/* Restore host FPEXC trashed in vcpu_load */
>> +/* For arm32 restore host FPEXC trashed in vcpu_load. */
>>  kvm_restore_host_fpexc(vcpu);
>>  
>>  /*
>> diff --git a/arch/arm64/include/asm/kvm_asm.h 
>> b/arch/arm64/include/asm/kvm_asm.h
>> index 5e37710..c589ca9 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>> +extern void __kvm_enable_fpexc32(void);
>> +extern void __kvm_save_fpexc32(struct kvm_vcpu *vcpu);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index 83e65dd..6e2d6b5 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -41,6 +41,8 @@
>>  
>>  #define KVM_VCPU_MAX_FEATURES 3
>>  
>> +#define kvm_guest_is32bit(vcpu)

Re: [PATCH v4 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-12-03 Thread Mario Smarduch


On 12/3/2015 7:58 AM, Marc Zyngier wrote:
> On 14/11/15 22:12, Mario Smarduch wrote:
>> This patch tracks armv7 fp/simd hardware state with a vcpu lazy flag.
>> On vcpu_load saves host fpexc and enables FP access, and later enables 
>> fp/simd
>> trapping if lazy flag is not set. On first fp/simd access trap to handler 
>> to save host and restore guest context, disable trapping and set vcpu lazy 
>> flag. On vcpu_put if flag is set save guest and restore host context and 
>> always restore host fpexc.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h   | 33 ++
>>  arch/arm/kvm/arm.c| 12 
>>  arch/arm/kvm/interrupts.S | 58 
>> +++
>>  arch/arm/kvm/interrupts_head.S| 26 +-
>>  arch/arm64/include/asm/kvm_host.h |  6 
>>  5 files changed, 104 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index f1bf551..8fc7a59 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -40,6 +40,38 @@
>>  
>>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>>  
>> +/*
>> + * Reads the host FPEXC register, saves it to vcpu context and enables the
>> + * FP/SIMD unit.
>> + */
>> +#ifdef CONFIG_VFPv3
>> +#define kvm_enable_fpexc(vcpu) {\
>> +u32 fpexc = 0;  \
>> +asm volatile(   \
>> +"mrc p10, 7, %0, cr8, cr0, 0\n" \
>> +"str %0, [%1]\n"\
>> +"orr %0, %0, #(1 << 30)\n"  \
>> +"mcr p10, 7, %0, cr8, cr0, 0\n" \
> 
> Don't you need an ISB here? 
Yes it does (B2.7.3) - was thinking something else but the manual is clear here.

> Also, it would be a lot nicer if this was a
> real function (possibly inlined). I don't see any real reason to make
> this a #define.
Had some trouble reconciling arm and arm64 compile making this
a function in kvm_host.h. I'll work to resolve it.

> 
> Also, you're preserving a lot of the host's FPEXC bits. Is that safe?
No it may not be, should just set the enable bit.
> 
>> +: "+r" (fpexc)  \
>> +: "r" (>arch.host_fpexc)  \
>> +);  \
>> +}
>> +#else
>> +#define kvm_enable_fpexc(vcpu)
>> +#endif
>> +
>> +/* Restores host FPEXC register */
>> +#ifdef CONFIG_VFPv3
>> +#define kvm_restore_host_fpexc(vcpu) {  \
>> +asm volatile(   \
>> +"mcr p10, 7, %0, cr8, cr0, 0\n" \
>> +: : "r" (vcpu->arch.host_fpexc) \
>> +);  \
>> +}
>> +#else
>> +#define kvm_restore_host_fpexc(vcpu)
>> +#endif
>> +
>>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>>  int __attribute_const__ kvm_target_cpu(void);
>>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>> @@ -227,6 +259,7 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index dc017ad..cfc348a 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -291,10 +291,22 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>>  
>>  kvm_arm_set_running_vcpu(vcpu);
>> +
>> +/* Save and enable FPEXC before we load guest context */
>> +kvm_enable_fpexc(vcpu);
>>  }
>>  
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>> +/* If the fp/simd registers are dirty save guest, restore host. */
>> +if (vcpu->arch.vfp_dirty) {
> 
> See my previous comment about the dirty state.
Yes that change seems to be working out fine.
> 
>> +kvm_restore_host_vfp_state(vcpu);
>> +vcpu->arch.vfp_dirty = 0;
>> +}
>> +

Re: [PATCH v2 00/21] arm64: KVM: world switch in C

2015-11-30 Thread Mario Smarduch


On 11/30/2015 12:33 PM, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:49:54PM +, Marc Zyngier wrote:
>> Once upon a time, the KVM/arm64 world switch was a nice, clean, lean
>> and mean piece of hand-crafted assembly code. Over time, features have
>> crept in, the code has become harder to maintain, and the smallest
>> change is a pain to introduce. The VHE patches are a prime example of
>> why this doesn't work anymore.
>>
>> This series rewrites most of the existing assembly code in C, but keeps
>> the existing code structure in place (most function names will look
>> familiar to the reader). The biggest change is that we don't have to
>> deal with a static register allocation (the compiler does it for us),
>> we can easily follow structure and pointers, and only the lowest level
>> is still in assembly code. Oh, and a negative diffstat.
>>
>> There is still a healthy dose of inline assembly (system register
>> accessors, runtime code patching), but I've tried not to make it too
>> invasive. The generated code, while not exactly brilliant, doesn't
>> look too shaby. I do expect a small performance degradation, but I
>> believe this is something we can improve over time (my initial
>> measurements don't show any obvious regression though).
> 
> I ran this through my experimental setup on m400 and got this:
> 
> BMv4.4-rc2v4.4-rc2-wsinc  overhead
> ----  
> Apache5297.11 5243.77 101.02%
> fio rand read 4354.33 4294.50 101.39%
> fio rand write2465.33 2231.33 110.49%
> hackbench 17.48   19.78   113.16%
> memcached 96442.69101274.04   95.23%
> TCP_MAERTS5966.89 6029.72 98.96%
> TCP_STREAM6284.60 6351.74 98.94%
> TCP_RR15044.7114324.03105.03%
> pbzip2 c  18.13   17.89   98.68%
> pbzip2 d  11.42   11.45   100.26%
> kernbench 50.13   50.28   100.30%
> mysql 1   152.84  154.01  100.77%
> mysql 2   98.12   98.94   100.84%
> mysql 4   51.32   51.17   99.71%
> mysql 8   27.31   27.70   101.42%
> mysql 20  16.80   17.21   102.47%
> mysql 100 13.71   14.11   102.92%
> mysql 200 15.20   15.20   100.00%
> mysql 400 17.16   17.16   100.00%
> 
> (you want to see this with a viewer that renders clear-text and tabs
> properly)
> 
> What this tells me is that we do take a noticable hit on the
> world-switch path, which shows up in the TCP_RR and hackbench workloads,
> which have a high precision in their output.
> 
> Note that the memcached number is well within its variability between
> individual benchmark runs, where it varies to 12% of its average in over
> 80% of the executions.
> 
> I don't think this is a showstopper thought, but we could consider
> looking more closely at a breakdown of the world-switch path and verify
> if/where we are really taking a hit.
> 
> -Christoffer
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

I ran some of the lmbench 'micro benchmarks' - currently
the usleep one consistently stands out by about .4% or extra 300ns
per sleep. Few other ones have some outliers, I will look at these
closer. Tests were ran on Juno.

- Mario
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-11-14 Thread Mario Smarduch
This patch tracks armv7 and armv8 fp/simd hardware state with a vcpu lazy flag.
On vcpu_load for 32 bit guests enable FP access, and later enable fp/simd
trapping for 32 and 64 bit guests if lazy flag is not set. On first fp/simd 
access trap to handler to save host and restore guest context, disable 
trapping and set vcpu lazy flag. On vcpu_put if flag is set save guest and 
restore host context and also save guest fpexc register.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_host.h   |  3 ++
 arch/arm/kvm/arm.c| 18 +++--
 arch/arm64/include/asm/kvm_asm.h  |  2 +
 arch/arm64/include/asm/kvm_host.h | 17 +++-
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S  | 83 +--
 6 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 8fc7a59..6960ff2 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -40,6 +40,8 @@
 
 #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
 
+#define kvm_guest_is32bit(vcpu)true
+
 /*
  * Reads the host FPEXC register, saves to vcpu context and enables the
  * FPEXC.
@@ -260,6 +262,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 void kvm_restore_host_vfp_state(struct kvm_vcpu *);
+static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu) {}
 
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index cfc348a..7a20530 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
kvm_arm_set_running_vcpu(vcpu);
 
-   /* Save and enable FPEXC before we load guest context */
-   kvm_enable_fpexc(vcpu);
+   /*
+* For 32bit guest executing on arm64, enable fp/simd access in
+* EL2. On arm32 save host fpexc and then enable fp/simd access.
+*/
+   if (kvm_guest_is32bit(vcpu))
+   kvm_enable_fpexc(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -301,10 +305,18 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
/* If the fp/simd registers are dirty save guest, restore host. */
if (vcpu->arch.vfp_dirty) {
kvm_restore_host_vfp_state(vcpu);
+
+   /*
+* For 32bit guest on arm64 save the guest fpexc register
+* in EL2 mode.
+*/
+   if (kvm_guest_is32bit(vcpu))
+   kvm_save_guest_fpexc(vcpu);
+
vcpu->arch.vfp_dirty = 0;
}
 
-   /* Restore host FPEXC trashed in vcpu_load */
+   /* For arm32 restore host FPEXC trashed in vcpu_load. */
kvm_restore_host_fpexc(vcpu);
 
/*
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e37710..c589ca9 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_enable_fpexc32(void);
+extern void __kvm_save_fpexc32(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 83e65dd..6e2d6b5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -41,6 +41,8 @@
 
 #define KVM_VCPU_MAX_FEATURES 3
 
+#define kvm_guest_is32bit(vcpu)(!(vcpu->arch.hcr_el2 & HCR_RW))
+
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 int kvm_arch_dev_ioctl_check_extension(long ext);
@@ -251,9 +253,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
-static inline void kvm_enable_fpexc(struct kvm_vcpu *vcpu) {}
-static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
+
+static inline void kvm_enable_fpexc(struct kvm_vcpu *vcpu)
+{
+   /* Enable FP/SIMD access from EL2 mode*/
+   kvm_call_hyp(__kvm_enable_fpexc32);
+}
+
+static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu)
+{
+   /* Save FPEXEC32_EL2 in EL2 mode */
+   kvm_call_hyp(__kvm_save_fpexc32, vcpu);
+}
 static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/a

[PATCH v4 0/3] KVM/arm/arm64: enhance armv7/8 fp/simd lazy switch

2015-11-14 Thread Mario Smarduch
This patch series combines the previous armv7 and armv8 versions.
For an FP and lmbench load it reduces fp/simd context switch from 30-50% down
to 2%. Results will vary with load but is no worse then current
approach.

In summary current lazy vfp/simd implementation switches hardware context only
on guest access and again on exit to host, otherwise hardware context is
skipped. This patch set builds on that functionality and executes a hardware
context switch only when  vCPU is scheduled out or returns to user space.

Patches were tested on FVP and Foundation Model sw platforms running floating 
point applications comparing outcome against known results. A bad FP/SIMDcontext
switch should result FP errors. Artificially skipping a fp/simd context switch
(1 in 1000) causes the applications to report errors.

The test can be found here, https://github.com/mjsmar/arm-arm64-fpsimd-test

Tests Ran:
armv7:
- On host executed 12 fp applications - evently pinned to cpus
- Two guests - with 12 fp crunching processes - also pinned to vpus.
- half ran with 1ms sleep, remaining with no sleep

armv8:
- same as above except used mix of armv7 and armv8 guests.

These patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

Changes since v3->v4:
- Followup on Christoffers comments 
  - Move fpexc handling to vcpu_load and vcpu_put
  - Enable and restore fpexc in EL2 mode when running a 32 bit guest on 
64bit EL2
  - rework hcptr handling

Changes since v2->v3:
- combined arm v7 and v8 into one short patch series
- moved access to fpexec_el2 back to EL2
- Move host restore to EL1 from EL2 and call directly from host
- optimize trap enable code 
- renamed some variables to match usage

Changes since v1->v2:
- Fixed vfp/simd trap configuration to enable trace trapping
- Removed set_hcptr branch label
- Fixed handling of FPEXC to restore guest and host versions on vcpu_put
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit


Mario Smarduch (3):
  add hooks for armv7 fp/simd lazy switch support
  enable enhanced armv7 fp/simd lazy switch
  enable enhanced armv8 fp/simd lazy switch

 arch/arm/include/asm/kvm_host.h   | 42 
 arch/arm/kernel/asm-offsets.c |  2 +
 arch/arm/kvm/arm.c| 24 +++
 arch/arm/kvm/interrupts.S | 58 ---
 arch/arm/kvm/interrupts_head.S| 26 
 arch/arm64/include/asm/kvm_asm.h  |  2 +
 arch/arm64/include/asm/kvm_host.h | 19 +
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S  | 83 +--
 9 files changed, 196 insertions(+), 61 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-11-14 Thread Mario Smarduch
This patch tracks armv7 fp/simd hardware state with a vcpu lazy flag.
On vcpu_load saves host fpexc and enables FP access, and later enables fp/simd
trapping if lazy flag is not set. On first fp/simd access trap to handler 
to save host and restore guest context, disable trapping and set vcpu lazy 
flag. On vcpu_put if flag is set save guest and restore host context and 
always restore host fpexc.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_host.h   | 33 ++
 arch/arm/kvm/arm.c| 12 
 arch/arm/kvm/interrupts.S | 58 +++
 arch/arm/kvm/interrupts_head.S| 26 +-
 arch/arm64/include/asm/kvm_host.h |  6 
 5 files changed, 104 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f1bf551..8fc7a59 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -40,6 +40,38 @@
 
 #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
 
+/*
+ * Reads the host FPEXC register, saves it to vcpu context and enables the
+ * FP/SIMD unit.
+ */
+#ifdef CONFIG_VFPv3
+#define kvm_enable_fpexc(vcpu) {   \
+   u32 fpexc = 0;  \
+   asm volatile(   \
+   "mrc p10, 7, %0, cr8, cr0, 0\n" \
+   "str %0, [%1]\n"\
+   "orr %0, %0, #(1 << 30)\n"  \
+   "mcr p10, 7, %0, cr8, cr0, 0\n" \
+   : "+r" (fpexc)  \
+   : "r" (>arch.host_fpexc)  \
+   );  \
+}
+#else
+#define kvm_enable_fpexc(vcpu)
+#endif
+
+/* Restores host FPEXC register */
+#ifdef CONFIG_VFPv3
+#define kvm_restore_host_fpexc(vcpu) { \
+   asm volatile(   \
+   "mcr p10, 7, %0, cr8, cr0, 0\n" \
+   : : "r" (vcpu->arch.host_fpexc) \
+   );  \
+}
+#else
+#define kvm_restore_host_fpexc(vcpu)
+#endif
+
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
@@ -227,6 +259,7 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..cfc348a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -291,10 +291,22 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
kvm_arm_set_running_vcpu(vcpu);
+
+   /* Save and enable FPEXC before we load guest context */
+   kvm_enable_fpexc(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   /* If the fp/simd registers are dirty save guest, restore host. */
+   if (vcpu->arch.vfp_dirty) {
+   kvm_restore_host_vfp_state(vcpu);
+   vcpu->arch.vfp_dirty = 0;
+   }
+
+   /* Restore host FPEXC trashed in vcpu_load */
+   kvm_restore_host_fpexc(vcpu);
+
/*
 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 * if the vcpu is no longer assigned to a cpu.  This is used for the
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..1ddaa89 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -28,6 +28,26 @@
 #include "interrupts_head.S"
 
.text
+/**
+ * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
+ * This function is called from host to save the guest, and restore host
+ * fp/simd hardware context. It's placed outside of hyp start/end region.
+ */
+ENTRY(kvm_restore_host_vfp_state)
+#ifdef CONFIG_VFPv3
+   push{r4-r7}
+
+   add r7, vcpu, #VCPU_VFP_GUEST
+   store_vfp_state r7
+
+   add r7, vcpu, #VCPU_VFP_HOST
+   ldr r7, [r7]
+   restore_vfp_state r7
+
+   pop {r4-r7}
+#endif
+   bx  lr
+ENDPROC(kvm_restore_host_vfp_state)
 
 __kvm_hyp_code_start:
.globl __kvm_hyp_code_start
@@ -116,22 +136,22 @@ ENTRY(__kvm_vcpu_run)
read_cp15_state store_to_vcpu = 0
write_cp15_state read_from_vcpu = 1
 
+   set_hcptr_bits set, r4, (HCPTR_TTA)
@ If the host kernel has not been configured with VFPv3 support,
@ then it is safer if we deny guests from using it as well.
 #ifdef CONFIG_VFPv3
-   @ Set FPEXC

[PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support

2015-11-14 Thread Mario Smarduch
This patch adds vcpu fields to track lazy state, save host FPEXC, and
offsets to fields.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_host.h | 6 ++
 arch/arm/kernel/asm-offsets.c   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3df1e97..f1bf551 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
 
+   /* fp/simd dirty flag true if guest accessed register file */
+   boolvfp_dirty;
+
+   /* Save host FPEXC register to later restore on vcpu put */
+   u32 host_fpexc;
+
/* Exception Information */
struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..9f79712 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -186,6 +186,8 @@ int main(void)
   DEFINE(VCPU_CPSR,offsetof(struct kvm_vcpu, 
arch.regs.usr_regs.ARM_cpsr));
   DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_DIRTY,   offsetof(struct kvm_vcpu, arch.vfp_dirty));
+  DEFINE(VCPU_VFP_HOST_FPEXC,  offsetof(struct kvm_vcpu, arch.host_fpexc));
   DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr));
   DEFINE(VCPU_HxFAR,   offsetof(struct kvm_vcpu, arch.fault.hxfar));
   DEFINE(VCPU_HPFAR,   offsetof(struct kvm_vcpu, arch.fault.hpfar));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch

2015-11-14 Thread Mario Smarduch


On 11/10/2015 3:18 AM, Christoffer Dall wrote:
> On Mon, Nov 09, 2015 at 03:13:15PM -0800, Mario Smarduch wrote:
>>
>>
>> On 11/5/2015 7:02 AM, Christoffer Dall wrote:
>>> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
[]
>> kern_hyp_va x0
>> add x2, x0, #VCPU_CONTEXT
>> mrs x1, fpexec32_el2
>> str x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
>> ret
>>
>> Of course each hyp call has additional overhead, at a high exit to
>> vcpu_put ratio hyp call appears better. But all this is very
>> highly dependent on exit rate and fp/simd usage. IMO hyp call
>> works better under extreme loads should be pretty close
>> for general loads.
>>
>> Any thoughts?
>>
> I think the typical case will be lots of exits and few
> vcpu_load/vcpu_put, and I think it's reasonable to write the code that
> way.

Yes, especially for RT guests where vCPU is pinned.

Thanks.
> 
> That should also be much better for VHE.
> 
> So I would go that direction.
> 
> Thanks,
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch

2015-11-09 Thread Mario Smarduch


On 11/5/2015 7:02 AM, Christoffer Dall wrote:
> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
>> This patch enables arm64 lazy fp/simd switch, similar to arm described in
>> second patch. Change from previous version - restore function is moved to
>> host. 
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h |  2 +-
>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>  arch/arm64/kvm/hyp.S  | 37 +++--
>>  3 files changed, 33 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index 26a2347..dcecf92 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>  
>>  #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kernel/asm-offsets.c 
>> b/arch/arm64/kernel/asm-offsets.c
>> index 8d89cf8..c9c5242 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -124,6 +124,7 @@ int main(void)
>>DEFINE(VCPU_HCR_EL2,  offsetof(struct kvm_vcpu, 
>> arch.hcr_el2));
>>DEFINE(VCPU_MDCR_EL2, offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_VFP_DIRTY,offsetof(struct kvm_vcpu, arch.vfp_dirty));
>>DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, 
>> arch.host_cpu_context));
>>DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
>> arch.host_debug_state));
>>DEFINE(VCPU_TIMER_CNTV_CTL,   offsetof(struct kvm_vcpu, 
>> arch.timer_cpu.cntv_ctl));
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index e583613..ed2c4cf 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -36,6 +36,28 @@
>>  #define CPU_SYSREG_OFFSET(x)(CPU_SYSREGS + 8*x)
>>  
>>  .text
>> +
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>> + *  fp/simd switch, saves the guest, restores host. Called from host
>> + *  mode, placed outside of hyp section.
> 
> same comments on style as previous patch
> 
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +pushxzr, lr
>> +
>> +add x2, x0, #VCPU_CONTEXT
>> +mov w3, #0
>> +strbw3, [x0, #VCPU_VFP_DIRTY]
> 
> I've been discussing with myself if it would make more sense to clear
> the dirty flag in the C-code...
> 
>> +
>> +bl __save_fpsimd
>> +
>> +ldr x2, [x0, #VCPU_HOST_CONTEXT]
>> +bl __restore_fpsimd
>> +
>> +pop xzr, lr
>> +ret
>> +ENDPROC(kvm_restore_host_vfp_state)
>> +
>>  .pushsection.hyp.text, "ax"
>>  .align  PAGE_SHIFT
>>  
>> @@ -482,7 +504,11 @@
>>  99:
>>  msr hcr_el2, x2
>>  mov x2, #CPTR_EL2_TTA
>> +
>> +ldrbw3, [x0, #VCPU_VFP_DIRTY]
>> +tbnzw3, #0, 98f
>>  orr x2, x2, #CPTR_EL2_TFP
>> +98:
> 
> mmm, don't you need to only set the fpexc32 when you're actually going
> to trap the guest accesses?
> 
> also, you can consider only setting this in vcpu_load (jumping quickly
> to EL2 to do so) if we're running a 32-bit guest.  Probably worth
> measuring the difference between the extra EL2 jump on vcpu_load
> compared to hitting this register on every entry/exit.
> 
> Code-wise, it will be nicer to do it on vcpu_load.
Hi Christoffer, Marc -
  just want to run this by you, I ran a test with typical number of
fp threads and couple lmbench benchmarks  the stride and bandwidth ones. The
ratio of exits to vcpu puts is high 50:1 or so. But of course that's subject
to the loads you run.

I substituted:
tbnz x2, #HCR_RW_SHIFT, 99f
mov x3, #(1 << 30)
msr fpexc32_el2, x3
isb

with vcpu_load hyp call and check for 32 bit gu

Re: [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch

2015-11-06 Thread Mario Smarduch


On 11/6/2015 3:29 AM, Christoffer Dall wrote:
> On Thu, Nov 05, 2015 at 04:57:12PM -0800, Mario Smarduch wrote:
>>
>>
>> On 11/5/2015 7:02 AM, Christoffer Dall wrote:
>>> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
>>>> This patch enables arm64 lazy fp/simd switch, similar to arm described in
>>>> second patch. Change from previous version - restore function is moved to
>>>> host. 
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_host.h |  2 +-
>>>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>>>  arch/arm64/kvm/hyp.S  | 37 
>>>> +++--
>>>>  3 files changed, 33 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>>> b/arch/arm64/include/asm/kvm_host.h
>>>> index 26a2347..dcecf92 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>>>  
>>>>  void kvm_arm_init_debug(void);
>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>>>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>>>  
>>>>  #endif /* __ARM64_KVM_HOST_H__ */
>>>> diff --git a/arch/arm64/kernel/asm-offsets.c 
>>>> b/arch/arm64/kernel/asm-offsets.c
>>>> index 8d89cf8..c9c5242 100644
>>>> --- a/arch/arm64/kernel/asm-offsets.c
>>>> +++ b/arch/arm64/kernel/asm-offsets.c
>>>> @@ -124,6 +124,7 @@ int main(void)
>>>>DEFINE(VCPU_HCR_EL2,offsetof(struct kvm_vcpu, 
>>>> arch.hcr_el2));
>>>>DEFINE(VCPU_MDCR_EL2,   offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>>>DEFINE(VCPU_IRQ_LINES,  offsetof(struct kvm_vcpu, arch.irq_lines));
>>>> +  DEFINE(VCPU_VFP_DIRTY,  offsetof(struct kvm_vcpu, arch.vfp_dirty));
>>>>DEFINE(VCPU_HOST_CONTEXT,   offsetof(struct kvm_vcpu, 
>>>> arch.host_cpu_context));
>>>>DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
>>>> arch.host_debug_state));
>>>>DEFINE(VCPU_TIMER_CNTV_CTL, offsetof(struct kvm_vcpu, 
>>>> arch.timer_cpu.cntv_ctl));
>>>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>>>> index e583613..ed2c4cf 100644
>>>> --- a/arch/arm64/kvm/hyp.S
>>>> +++ b/arch/arm64/kvm/hyp.S
>>>> @@ -36,6 +36,28 @@
>>>>  #define CPU_SYSREG_OFFSET(x)  (CPU_SYSREGS + 8*x)
>>>>  
>>>>.text
>>>> +
>>>> +/**
>>>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>>>> + *fp/simd switch, saves the guest, restores host. Called from host
>>>> + *mode, placed outside of hyp section.
>>>
>>> same comments on style as previous patch
>> Got it.
>>>
>>>> + */
>>>> +ENTRY(kvm_restore_host_vfp_state)
>>>> +  pushxzr, lr
>>>> +
>>>> +  add x2, x0, #VCPU_CONTEXT
>>>> +  mov w3, #0
>>>> +  strbw3, [x0, #VCPU_VFP_DIRTY]
>>>
>>> I've been discussing with myself if it would make more sense to clear
>>> the dirty flag in the C-code...
>> Since all the work is done here I placed it here.
> 
> yeah, that's what I thought first, but then I thought it's typically
> easier to understand the logic in the C-code and this is technically a
> side-effect from the name of the function "kvm_restore_host_vfp_state"
> which should then be "kvm_restore_host_vfp_state_and_clear_dirty_flag"
> :)
> 

Ok I'll set in C.
>>>
>>>> +
>>>> +  bl __save_fpsimd
>>>> +
>>>> +  ldr x2, [x0, #VCPU_HOST_CONTEXT]
>>>> +  bl __restore_fpsimd
>>>> +
>>>> +  pop xzr, lr
>>>> +  ret
>>>> +ENDPROC(kvm_restore_host_vfp_state)
>&

Re: [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-11-06 Thread Mario Smarduch


On 11/6/2015 3:37 AM, Christoffer Dall wrote:
> On Thu, Nov 05, 2015 at 04:23:41PM -0800, Mario Smarduch wrote:
>>
>>
>> On 11/5/2015 6:48 AM, Christoffer Dall wrote:
>>> On Fri, Oct 30, 2015 at 02:56:32PM -0700, Mario Smarduch wrote:
>>>> This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
>>>> flag is set on guest access and traps to vfp/simd hardware switch handler. 
>>>> On 
>>>> vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
>>>> vm-exit if flag is set skip hardware context switch and return to host 
>>>> with 
>>>> guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
>>>> hardware context switch to restore host.
>>>>
>>>> Also some arm64 field and empty function are added to compile for arm64.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_host.h   |  1 +
>>>>  arch/arm/kvm/arm.c|  6 
>>>>  arch/arm/kvm/interrupts.S | 60 
>>>> ---
>>>>  arch/arm/kvm/interrupts_head.S| 14 +
>>>>  arch/arm64/include/asm/kvm_host.h |  4 +++
>>>>  5 files changed, 63 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_host.h 
>>>> b/arch/arm/include/asm/kvm_host.h
>>>> index f1bf551..a9e86e0 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>>>  
>>>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>>>  
>>>>  static inline void kvm_arch_hardware_disable(void) {}
>>>>  static inline void kvm_arch_hardware_unsetup(void) {}
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index dc017ad..11a56fe 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
>>>> cpu)
>>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>>  {
>>>>/*
>>>> +   * If fp/simd registers are dirty save guest, restore host before
>>>
>>> If the fp/simd registers are dirty, then restore the host state before
>> I'd drop 'releasing the cpu', the vcpu thread may be returning to
>> user mode.
>>>
>>>> +   * releasing the cpu.
>>>> +   */
>>>> +  if (vcpu->arch.vfp_dirty)
>>>> +  kvm_restore_host_vfp_state(vcpu);
>>>> +  /*
>>>> * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>>> * if the vcpu is no longer assigned to a cpu.  This is used for the
>>>> * optimized make_all_cpus_request path.
>>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>>> index 900ef6d..ca25314 100644
>>>> --- a/arch/arm/kvm/interrupts.S
>>>> +++ b/arch/arm/kvm/interrupts.S
>>>> @@ -28,6 +28,32 @@
>>>>  #include "interrupts_head.S"
>>>>  
>>>>.text
>>>> +/**
>>>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>>>
>>> nit: Can you move the multi-line description of the function into a
>>> separate paragraph?
>> Sure.
>>>
>>>> + *fp/simd switch, saves the guest, restores host. Called from host
>>>> + *mode, placed outside of hyp region start/end.
>>>
>>> Put the description in a separate paragraph and get rid of the "executes
>>> lazy fp/simd swithch" part, that doesn't help understanding.  Just say
>>> that this funciton restores the host state.
>> Sure.
>>>
>>>> + */
>>>> +ENTRY(kvm_restore_host_vfp_state)
>>>> +#ifdef CONFIG_VFPv3
>>>> +  push{r4-r7}
>>>> +
>>>> +  add r7, vcpu, #VCPU_VFP_GUEST
>>>> +  store_vfp_state r7
>>>> +
>>>> +  add r7, vcpu, #VCPU_VFP_HOST
>>>> +  ldr r7, [r7]
>>>> +  restore_vfp_state r7
>>>> +
>>>> +  ldr r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
>>>> +  VFPF

Re: [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-11-05 Thread Mario Smarduch


On 11/5/2015 6:48 AM, Christoffer Dall wrote:
> On Fri, Oct 30, 2015 at 02:56:32PM -0700, Mario Smarduch wrote:
>> This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
>> flag is set on guest access and traps to vfp/simd hardware switch handler. 
>> On 
>> vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
>> vm-exit if flag is set skip hardware context switch and return to host with 
>> guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
>> hardware context switch to restore host.
>>
>> Also some arm64 field and empty function are added to compile for arm64.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h   |  1 +
>>  arch/arm/kvm/arm.c|  6 
>>  arch/arm/kvm/interrupts.S | 60 
>> ---
>>  arch/arm/kvm/interrupts_head.S| 14 +
>>  arch/arm64/include/asm/kvm_host.h |  4 +++
>>  5 files changed, 63 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index f1bf551..a9e86e0 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index dc017ad..11a56fe 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>>  /*
>> + * If fp/simd registers are dirty save guest, restore host before
> 
> If the fp/simd registers are dirty, then restore the host state before
I'd drop 'releasing the cpu', the vcpu thread may be returning to
user mode.
> 
>> + * releasing the cpu.
>> + */
>> +if (vcpu->arch.vfp_dirty)
>> +kvm_restore_host_vfp_state(vcpu);
>> +/*
>>   * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>   * if the vcpu is no longer assigned to a cpu.  This is used for the
>>   * optimized make_all_cpus_request path.
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 900ef6d..ca25314 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -28,6 +28,32 @@
>>  #include "interrupts_head.S"
>>  
>>  .text
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> 
> nit: Can you move the multi-line description of the function into a
> separate paragraph?
Sure.
> 
>> + *  fp/simd switch, saves the guest, restores host. Called from host
>> + *  mode, placed outside of hyp region start/end.
> 
> Put the description in a separate paragraph and get rid of the "executes
> lazy fp/simd swithch" part, that doesn't help understanding.  Just say
> that this funciton restores the host state.
Sure.
> 
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +#ifdef CONFIG_VFPv3
>> +push{r4-r7}
>> +
>> +add r7, vcpu, #VCPU_VFP_GUEST
>> +store_vfp_state r7
>> +
>> +add r7, vcpu, #VCPU_VFP_HOST
>> +ldr r7, [r7]
>> +restore_vfp_state r7
>> +
>> +ldr r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
>> +VFPFMXR FPEXC, r3
>> +
>> +mov r3, #0
>> +strbr3, [vcpu, #VCPU_VFP_DIRTY]
>> +
>> +pop {r4-r7}
>> +#endif
>> +bx  lr
>> +ENDPROC(kvm_restore_host_vfp_state)
>>  
>>  __kvm_hyp_code_start:
>>  .globl __kvm_hyp_code_start
>> @@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
>>  @ If the host kernel has not been configured with VFPv3 support,
>>  @ then it is safer if we deny guests from using it as well.
>>  #ifdef CONFIG_VFPv3
>> -@ Set FPEXC_EN so the guest doesn't trap floating point instructions
>> +@ fp/simd register file has already been accessed, so skip host fpexc
>> +@ save and access trap enable.
>> +vfp_inlazy_mode r7, skip_guest_vfp_trap
> 
> So, why do we need to touch this register at all on every CPU ex

Re: [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch

2015-11-05 Thread Mario Smarduch


On 11/5/2015 7:02 AM, Christoffer Dall wrote:
> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
>> This patch enables arm64 lazy fp/simd switch, similar to arm described in
>> second patch. Change from previous version - restore function is moved to
>> host. 
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h |  2 +-
>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>  arch/arm64/kvm/hyp.S  | 37 +++--
>>  3 files changed, 33 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index 26a2347..dcecf92 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>  
>>  #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kernel/asm-offsets.c 
>> b/arch/arm64/kernel/asm-offsets.c
>> index 8d89cf8..c9c5242 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -124,6 +124,7 @@ int main(void)
>>DEFINE(VCPU_HCR_EL2,  offsetof(struct kvm_vcpu, 
>> arch.hcr_el2));
>>DEFINE(VCPU_MDCR_EL2, offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_VFP_DIRTY,offsetof(struct kvm_vcpu, arch.vfp_dirty));
>>DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, 
>> arch.host_cpu_context));
>>DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
>> arch.host_debug_state));
>>DEFINE(VCPU_TIMER_CNTV_CTL,   offsetof(struct kvm_vcpu, 
>> arch.timer_cpu.cntv_ctl));
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index e583613..ed2c4cf 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -36,6 +36,28 @@
>>  #define CPU_SYSREG_OFFSET(x)(CPU_SYSREGS + 8*x)
>>  
>>  .text
>> +
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>> + *  fp/simd switch, saves the guest, restores host. Called from host
>> + *  mode, placed outside of hyp section.
> 
> same comments on style as previous patch
Got it.
> 
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +pushxzr, lr
>> +
>> +add x2, x0, #VCPU_CONTEXT
>> +mov w3, #0
>> +strbw3, [x0, #VCPU_VFP_DIRTY]
> 
> I've been discussing with myself if it would make more sense to clear
> the dirty flag in the C-code...
Since all the work is done here I placed it here.
> 
>> +
>> +bl __save_fpsimd
>> +
>> +ldr x2, [x0, #VCPU_HOST_CONTEXT]
>> +bl __restore_fpsimd
>> +
>> +pop xzr, lr
>> +ret
>> +ENDPROC(kvm_restore_host_vfp_state)
>> +
>>  .pushsection.hyp.text, "ax"
>>  .align  PAGE_SHIFT
>>  
>> @@ -482,7 +504,11 @@
>>  99:
>>  msr hcr_el2, x2
>>  mov x2, #CPTR_EL2_TTA
>> +
>> +ldrbw3, [x0, #VCPU_VFP_DIRTY]
>> +tbnzw3, #0, 98f
>>  orr x2, x2, #CPTR_EL2_TFP
>> +98:
> 
> mmm, don't you need to only set the fpexc32 when you're actually going
> to trap the guest accesses?

My understanding is you always need to set enable in fpexec32 for 32 bit guests,
otherwise EL1 would get the trap instead of EL2. Not sure if that's the point
you're making.

> 
> also, you can consider only setting this in vcpu_load (jumping quickly
> to EL2 to do so) if we're running a 32-bit guest.  Probably worth
> measuring the difference between the extra EL2 jump on vcpu_load
> compared to hitting this register on every entry/exit.

Sure, makes sense since this is a hot code path.
> 
> Code-wise, it will be nicer to do it on vcpu_load.
> 
>>  msr cptr_el2, x2
>>  
>>  mov x2, #(1 << 15)  // Trap CP15 Cr=15
>>

Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-04 Thread Mario Smarduch


On 11/4/2015 10:51 AM, Ard Biesheuvel wrote:
> On 4 November 2015 at 19:49, Christopher Covington <c...@codeaurora.org> 
> wrote:
>> On 11/04/2015 08:31 AM, Christoffer Dall wrote:
>>> On Tue, Nov 03, 2015 at 01:39:44PM -0600, Rob Herring wrote:
>>>> On Tue, Nov 3, 2015 at 1:17 PM, Mario Smarduch <m.smard...@samsung.com> 
>>>> wrote:
>>>>> On 11/3/2015 9:55 AM, Will Deacon wrote:
>>>>>> On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote:
>>>>>>> On 11/3/2015 8:33 AM, Christopher Covington wrote:
>>>>>>>> On 11/02/2015 06:51 PM, Mario Smarduch wrote:
>>>>>>>>>this is a re-post from couple weeks ago, please take time to 
>>>>>>>>> review this
>>>>>>>>> simple patch which simplifies DEBUG_LL and prevents kernel crash on 
>>>>>>>>> virtual
>>>>>>>>> platforms.
>>>>>>>>>
>>>>>>>>> Before this patch DEBUG_LL for 'dummy virtual machine':
>>>>>>>>>
>>>>>>>>> ( ) Kernel low-level debugging via EmbeddedICE DCC channel
>>>>>>>>> ( ) Kernel low-level debug output via semihosting I/O
>>>>>>>>> ( ) Kernel low-level debugging via 8250 UART
>>>>>>>>> ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell
>>>>>>>>>
>>>>>>>>> In summary if debug uart is not emulated kernel crashes.
>>>>>>>>> And once you pass that hurdle, uart physical/virtual addresses are 
>>>>>>>>> unknown.
>>>>>>>>> DEBUG_LL comes in handy on many occasions and should be somewhat
>>>>>>>>> intuitive to use like it is for physical platforms. For virtual 
>>>>>>>>> platforms
>>>>>>>>> user may start daubting the host and get into a bigger mess.
>>>>>>>>>
>>>>>>>>> After this patch is applied user gets:
>>>>>>>>>
>>>>>>>>> (X) Kernel low-level debugging on QEMU Virtual Platform
>>>>>>>>> ( ) Kernel low-level debugging on Kvmtool Virtual Platform
>>>>>>>>>. above repeated 
>>>>>>>>>
>>>>>>>>> The virtual addresses selected follow arm reference models, high in 
>>>>>>>>> vmalloc
>>>>>>>>> section with high mem enabled and guest running with >= 1GB of 
>>>>>>>>> memory. The
>>>>>>>>> offset is leftover from arm reference models.
>>>>>>>>
>>>>>>>> Which model? It doesn't appear to match the vexpress 
>>>>>>>> AEM/RTSM/FVP/whatever
>>>>>>>> which used 0x1c09 for UART0.
>>>>>>>
>>>>>>> I recall QEMU virt model had it's own physical address map, for sure I 
>>>>>>> saw the
>>>>>>> virtio-mmio regions assigned in some ARM document. Peter would you know?
>>>>>>>
>>>>>>> As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre 
>>>>>>> will that
>>>>>>> stay fixed?
>>>>>>
>>>>>> We make absolutely no guarantees about the memory map provided by 
>>>>>> kvmtool.
>>>>>
>>>>> If that's also the case for qemu, then I guess the best you can do is 
>>>>> find a way
>>>>> to dump the device tree. Find the uart, physical address and try figure 
>>>>> out the
>>>>> virtual address.
>>>>>
>>>>> Pretty involved, hoped for something more automated since that's a handy 
>>>>> feature.
>>>>
>>>> You really only need LL_DEBUG now if you are debugging very early code
>>>> before memory is setup and/or bad memory. Use earlycon instead which
>>>> should already be supported both via the pl011 or semihosting. I used
>>>> it with QEMU semihosting support.
>>>>
>>> Then we should really document how to use that with qemu's virt platform
>>> and kvmtool's platform on both 32-bit and 64-bit so that users can
>>> easily figure out what they're doing wrong when they get no output.
>>>
>>> In practice, the address for the pl011 is quite unlikely to change, I
>>> dare speculate, so that documentation shouldn't need frequent updating.
>>
>> Is it not on by default since the following change?
>>
>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f022b8e95379b0433d13509706b66f38fc15dde8
>>
> 
> Yes, but it still requires the plain 'earlycon' argument (i.e, without
> '=pl011,...') to be passed on the kernel command line if you want
> early output.
> 

I spent time debugging 'earlycon' for pl011, ironically using DEBUG_LL, from the
looks of it no mmio uart should work for armv7. It appears earlycon_map()
requires a fixed mapping similar to arm64.

Comparing both options, DEBUG_LL takes you from kernel decompress code to early
FDT parsing. A lot of early_print() calls won't work if DEBUG_LL is not enabled
including dump_machine_table which ends in a endless loop. IMO it's worth
turning this option on for that and other reasons.

'earlycon' is enabled some ways up in setup_arch().

As far as the patch, providing a hint to the user with probable uart addresses
would help and in the worst case "see the latest device tree for the virtual
platform".

- Mario




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-03 Thread Mario Smarduch


On 11/3/2015 8:33 AM, Christopher Covington wrote:
> Hi Mario,
> 
> On 11/02/2015 06:51 PM, Mario Smarduch wrote:
>> Hello,
>>this is a re-post from couple weeks ago, please take time to review this 
>> simple patch which simplifies DEBUG_LL and prevents kernel crash on virtual 
>> platforms.
>>
>> Before this patch DEBUG_LL for 'dummy virtual machine':
>>
>> ( ) Kernel low-level debugging via EmbeddedICE DCC channel
>> ( ) Kernel low-level debug output via semihosting I/O
>> ( ) Kernel low-level debugging via 8250 UART
>> ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell
>>
>> In summary if debug uart is not emulated kernel crashes.
>> And once you pass that hurdle, uart physical/virtual addresses are unknown.
>> DEBUG_LL comes in handy on many occasions and should be somewhat 
>> intuitive to use like it is for physical platforms. For virtual platforms
>> user may start daubting the host and get into a bigger mess.
>>
>> After this patch is applied user gets:
>>
>> (X) Kernel low-level debugging on QEMU Virtual Platform
>> ( ) Kernel low-level debugging on Kvmtool Virtual Platform
>>  . above repeated 
>>
>> The virtual addresses selected follow arm reference models, high in vmalloc 
>> section with high mem enabled and guest running with >= 1GB of memory. The 
>> offset is leftover from arm reference models.
> 
> Which model? It doesn't appear to match the vexpress AEM/RTSM/FVP/whatever
> which used 0x1c09 for UART0.

I recall QEMU virt model had it's own physical address map, for sure I saw the
virtio-mmio regions assigned in some ARM document. Peter would you know?

As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre will that
stay fixed?

> 
>> The patch is against 4.2.0-rc2 commit 43297dda0a51
>>
>> Original Description
>> 
>> When booting a VM using QEMU or Kvmtool there are no clear ways to 
>> enable low level debugging for these virtual platforms. some menu port 
>> choices are not supported by the virtual platforms at all. And there is no
>> help on the location of physical and virtual addresses for the ports.
>> This may lead to wrong debug port and a frozen VM with a blank screen.
>>
>> This patch adds menu selections for QEMU and Kvmtool virtual platforms for 
>> low 
>> level kernel print debugging. Help section displays port physical and
>> virutal addresses.
>>
>> ARM reference models use the MIDR register to run-time select UART port 
>> address 
>> (for ARCH_VEXPRESS) based on A9 or A15 part numbers. Looked for a same 
>> approach
>> but couldn't find a way to differentiate between virtual platforms, something
>> like a platform register.
>>
>> Acked-by: Christoffer Dall <christoffer.d...@linaro.org>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/Kconfig.debug | 22 ++
>>  1 file changed, 22 insertions(+)
>>
>> diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
>> index a2e16f9..d126bd4 100644
>> --- a/arch/arm/Kconfig.debug
>> +++ b/arch/arm/Kconfig.debug
>> @@ -1155,6 +1155,28 @@ choice
>>This option selects UART0 on VIA/Wondermedia System-on-a-chip
>>devices, including VT8500, WM8505, WM8650 and WM8850.
>>  
>> +config DEBUG_VIRT_UART_QEMU
>> +bool "Kernel low-level debugging on QEMU Virtual Platform"
>> +depends on ARCH_VIRT
>> +select DEBUG_UART_PL01X
>> +help
>> +  Say Y here if you want the debug print routines to direct
>> +  their output to PL011 UART port on QEMU Virtual Platform.
>> +  Appropriate address values are:
>> +PHYSVIRT
>> +0x900   0xf809
> 
> I thought the only guarantee the virt machine had about the memory map was
> that it would be described in the device tree.
> 
>> +config DEBUG_VIRT_UART_KVMTOOL
>> +bool "Kernel low-level debugging on Kvmtool Virtual Platform"
>> +depends on ARCH_VIRT
>> +select DEBUG_UART_8250
>> +help
>> +  Say Y here if you want the debug print routines to direct
>> +  their output to 8250 UART port on Kvmtool Virtual
>> +  Platform. Appropriate address values are:
>> +PHYSVIRT
>> +0x3f8   0xf80903f8
>> +
>>  config DEBUG_ICEDCC
>>  bool "Kernel low-level debugging via EmbeddedICE DCC channel"
>>  help
>>
> 
> Regards,
> Christopher Covington
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-03 Thread Mario Smarduch


On 11/3/2015 9:55 AM, Will Deacon wrote:
> On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote:
>> On 11/3/2015 8:33 AM, Christopher Covington wrote:
>>> On 11/02/2015 06:51 PM, Mario Smarduch wrote:
>>>>this is a re-post from couple weeks ago, please take time to review 
>>>> this 
>>>> simple patch which simplifies DEBUG_LL and prevents kernel crash on 
>>>> virtual 
>>>> platforms.
>>>>
>>>> Before this patch DEBUG_LL for 'dummy virtual machine':
>>>>
>>>> ( ) Kernel low-level debugging via EmbeddedICE DCC channel
>>>> ( ) Kernel low-level debug output via semihosting I/O
>>>> ( ) Kernel low-level debugging via 8250 UART
>>>> ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell
>>>>
>>>> In summary if debug uart is not emulated kernel crashes.
>>>> And once you pass that hurdle, uart physical/virtual addresses are unknown.
>>>> DEBUG_LL comes in handy on many occasions and should be somewhat 
>>>> intuitive to use like it is for physical platforms. For virtual platforms
>>>> user may start daubting the host and get into a bigger mess.
>>>>
>>>> After this patch is applied user gets:
>>>>
>>>> (X) Kernel low-level debugging on QEMU Virtual Platform
>>>> ( ) Kernel low-level debugging on Kvmtool Virtual Platform
>>>>. above repeated 
>>>>
>>>> The virtual addresses selected follow arm reference models, high in 
>>>> vmalloc 
>>>> section with high mem enabled and guest running with >= 1GB of memory. The 
>>>> offset is leftover from arm reference models.
>>>
>>> Which model? It doesn't appear to match the vexpress AEM/RTSM/FVP/whatever
>>> which used 0x1c09 for UART0.
>>
>> I recall QEMU virt model had it's own physical address map, for sure I saw 
>> the
>> virtio-mmio regions assigned in some ARM document. Peter would you know?
>>
>> As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre will 
>> that
>> stay fixed?
> 
> We make absolutely no guarantees about the memory map provided by kvmtool.
> 
> Will
> 

If that's also the case for qemu, then I guess the best you can do is find a way
to dump the device tree. Find the uart, physical address and try figure out the
virtual address.

Pretty involved, hoped for something more automated since that's a handy 
feature.

Thanks,
- Mario.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-02 Thread Mario Smarduch
Hello,
   this is a re-post from couple weeks ago, please take time to review this 
simple patch which simplifies DEBUG_LL and prevents kernel crash on virtual 
platforms.

Before this patch DEBUG_LL for 'dummy virtual machine':

( ) Kernel low-level debugging via EmbeddedICE DCC channel
( ) Kernel low-level debug output via semihosting I/O
( ) Kernel low-level debugging via 8250 UART
( ) Kernel low-level debugging via ARM Ltd PL01x Primecell

In summary if debug uart is not emulated kernel crashes.
And once you pass that hurdle, uart physical/virtual addresses are unknown.
DEBUG_LL comes in handy on many occasions and should be somewhat 
intuitive to use like it is for physical platforms. For virtual platforms
user may start daubting the host and get into a bigger mess.

After this patch is applied user gets:

(X) Kernel low-level debugging on QEMU Virtual Platform
( ) Kernel low-level debugging on Kvmtool Virtual Platform
. above repeated 

The virtual addresses selected follow arm reference models, high in vmalloc 
section with high mem enabled and guest running with >= 1GB of memory. The 
offset is leftover from arm reference models.

The patch is against 4.2.0-rc2 commit 43297dda0a51

Original Description

When booting a VM using QEMU or Kvmtool there are no clear ways to 
enable low level debugging for these virtual platforms. some menu port 
choices are not supported by the virtual platforms at all. And there is no
help on the location of physical and virtual addresses for the ports.
This may lead to wrong debug port and a frozen VM with a blank screen.

This patch adds menu selections for QEMU and Kvmtool virtual platforms for low 
level kernel print debugging. Help section displays port physical and
virutal addresses.

ARM reference models use the MIDR register to run-time select UART port address 
(for ARCH_VEXPRESS) based on A9 or A15 part numbers. Looked for a same approach
but couldn't find a way to differentiate between virtual platforms, something
like a platform register.

Acked-by: Christoffer Dall <christoffer.d...@linaro.org>
Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/Kconfig.debug | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index a2e16f9..d126bd4 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -1155,6 +1155,28 @@ choice
  This option selects UART0 on VIA/Wondermedia System-on-a-chip
  devices, including VT8500, WM8505, WM8650 and WM8850.
 
+   config DEBUG_VIRT_UART_QEMU
+   bool "Kernel low-level debugging on QEMU Virtual Platform"
+   depends on ARCH_VIRT
+   select DEBUG_UART_PL01X
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to PL011 UART port on QEMU Virtual Platform.
+ Appropriate address values are:
+   PHYSVIRT
+   0x900   0xf809
+
+   config DEBUG_VIRT_UART_KVMTOOL
+   bool "Kernel low-level debugging on Kvmtool Virtual Platform"
+   depends on ARCH_VIRT
+   select DEBUG_UART_8250
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to 8250 UART port on Kvmtool Virtual
+ Platform. Appropriate address values are:
+   PHYSVIRT
+   0x3f8   0xf80903f8
+
config DEBUG_ICEDCC
bool "Kernel low-level debugging via EmbeddedICE DCC channel"
help
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] KVM/arm64/arm: enhance armv7/8 fp/simd lazy switch

2015-10-30 Thread Mario Smarduch
This short patch series combines the previous armv7 and armv8 versions.
For an FP and lmbench load it reduces fp/simd context switch from 30-50% down 
to 2%. Results will vary with load but is no worse then current
approach. 

In summary current lazy vfp/simd implementation switches hardware context only 
on guest access and again on exit to host, otherwise hardware context is
skipped. This patch set builds on that functionality and executes a hardware 
context switch only when  vCPU is scheduled out or returns to user space.

Patches were tested on FVP sw platform. FP crunching applications summing up
values, with outcome compared to known result were executed on several guests,
and host.

The test can be found here, https://github.com/mjsmar/arm-arm64-fpsimd-test
Tests executed 24 hours.

armv7 test:
- On host executed 12 fp crunching applications - used taskset to bind 
- Two guests - with 12 fp crunching processes - used taskset to bind
- half ran with 1ms sleep, remaining with no sleep

armv8 test: 
- same as above except used mix of armv7 and armv8 guests.

Every so often injected a fault (via proc file entry) and mismatch between 
expected and crunched summed value was reported. The FP crunch processes could 
continue to run but with bad results.

Looked at 'paranoia.c' - appears like a comprehensive hardware FP 
precision/behavior test.  It will test various behaviors and may fail having 
nothing to do with world switch of fp/simd - 
- Adequacy of guard digits for Mult., Div. and Subt.
- UnderflowThreshold = an underflow threshold.
- V = an overflow threshold, roughly.
...

With outcomes like -
- Smallest strictly positive number found is E0 = 4.94066e-324
- Searching for Overflow threshold: This may generate an error.
...

Personally don't understand everything it's dong.

Opted to use the simple tst-float executable.

These patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

Changes since v2->v3:
- combined arm v7 and v8 into one short patch series
- moved access to fpexec_el2 back to EL2
- Move host restore to EL1 from EL2 and call directly from host
- optimize trap enable code 
- renamed some variables to match usage

Changes since v1->v2:
- Fixed vfp/simd trap configuration to enable trace trapping
- Removed set_hcptr branch label
- Fixed handling of FPEXC to restore guest and host versions on vcpu_put
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit


Mario Smarduch (3):
  hooks for armv7 fp/simd lazy switch support
  enable enhanced armv7 fp/simd lazy switch
  enable enhanced armv8 fp/simd lazy switch

 arch/arm/include/asm/kvm_host.h   |  7 +
 arch/arm/kernel/asm-offsets.c |  2 ++
 arch/arm/kvm/arm.c|  6 
 arch/arm/kvm/interrupts.S | 60 ---
 arch/arm/kvm/interrupts_head.S| 14 +
 arch/arm64/include/asm/kvm_host.h |  4 +++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S  | 37 
 8 files changed, 103 insertions(+), 28 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support

2015-10-30 Thread Mario Smarduch
This patch adds vcpu fields to track lazy state, save host FPEXC, and
offsets to fields.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_host.h | 6 ++
 arch/arm/kernel/asm-offsets.c   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3df1e97..f1bf551 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
 
+   /* fp/simd dirty flag true if guest accessed register file */
+   boolvfp_dirty;
+
+   /* Save host FPEXC register to later restore on vcpu put */
+   u32 host_fpexc;
+
/* Exception Information */
struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..9f79712 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -186,6 +186,8 @@ int main(void)
   DEFINE(VCPU_CPSR,offsetof(struct kvm_vcpu, 
arch.regs.usr_regs.ARM_cpsr));
   DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_DIRTY,   offsetof(struct kvm_vcpu, arch.vfp_dirty));
+  DEFINE(VCPU_VFP_HOST_FPEXC,  offsetof(struct kvm_vcpu, arch.host_fpexc));
   DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr));
   DEFINE(VCPU_HxFAR,   offsetof(struct kvm_vcpu, arch.fault.hxfar));
   DEFINE(VCPU_HPFAR,   offsetof(struct kvm_vcpu, arch.fault.hpfar));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-10-30 Thread Mario Smarduch
This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
flag is set on guest access and traps to vfp/simd hardware switch handler. On 
vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
vm-exit if flag is set skip hardware context switch and return to host with 
guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
hardware context switch to restore host.

Also some arm64 field and empty function are added to compile for arm64.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/arm.c|  6 
 arch/arm/kvm/interrupts.S | 60 ---
 arch/arm/kvm/interrupts_head.S| 14 +
 arch/arm64/include/asm/kvm_host.h |  4 +++
 5 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f1bf551..a9e86e0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..11a56fe 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
/*
+* If fp/simd registers are dirty save guest, restore host before
+* releasing the cpu.
+*/
+   if (vcpu->arch.vfp_dirty)
+   kvm_restore_host_vfp_state(vcpu);
+   /*
 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 * if the vcpu is no longer assigned to a cpu.  This is used for the
 * optimized make_all_cpus_request path.
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..ca25314 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -28,6 +28,32 @@
 #include "interrupts_head.S"
 
.text
+/**
+ * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
+ * fp/simd switch, saves the guest, restores host. Called from host
+ * mode, placed outside of hyp region start/end.
+ */
+ENTRY(kvm_restore_host_vfp_state)
+#ifdef CONFIG_VFPv3
+   push{r4-r7}
+
+   add r7, vcpu, #VCPU_VFP_GUEST
+   store_vfp_state r7
+
+   add r7, vcpu, #VCPU_VFP_HOST
+   ldr r7, [r7]
+   restore_vfp_state r7
+
+   ldr r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
+   VFPFMXR FPEXC, r3
+
+   mov r3, #0
+   strbr3, [vcpu, #VCPU_VFP_DIRTY]
+
+   pop {r4-r7}
+#endif
+   bx  lr
+ENDPROC(kvm_restore_host_vfp_state)
 
 __kvm_hyp_code_start:
.globl __kvm_hyp_code_start
@@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
@ If the host kernel has not been configured with VFPv3 support,
@ then it is safer if we deny guests from using it as well.
 #ifdef CONFIG_VFPv3
-   @ Set FPEXC_EN so the guest doesn't trap floating point instructions
+   @ fp/simd register file has already been accessed, so skip host fpexc
+   @ save and access trap enable.
+   vfp_inlazy_mode r7, skip_guest_vfp_trap
+
VFPFMRX r2, FPEXC   @ VMRS
-   push{r2}
+   str r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
orr r2, r2, #FPEXC_EN
VFPFMXR FPEXC, r2   @ VMSR
+   set_hcptr vmentry, (HCPTR_TCP(10) | HCPTR_TCP(11))
+skip_guest_vfp_trap:
 #endif
 
@ Configure Hyp-role
@@ -131,7 +162,7 @@ ENTRY(__kvm_vcpu_run)
 
@ Trap coprocessor CRx accesses
set_hstr vmentry
-   set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+   set_hcptr vmentry, (HCPTR_TTA)
set_hdcr vmentry
 
@ Write configured ID register into MIDR alias
@@ -170,22 +201,15 @@ __kvm_vcpu_return:
@ Don't trap coprocessor accesses for host kernel
set_hstr vmexit
set_hdcr vmexit
-   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), 
after_vfp_restore
+   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 
 #ifdef CONFIG_VFPv3
-   @ Switch VFP/NEON hardware state to the host's
-   add r7, vcpu, #VCPU_VFP_GUEST
-   store_vfp_state r7
-   add r7, vcpu, #VCPU_VFP_HOST
-   ldr r7, [r7]
-   restore_vfp_state r7
-
-after_vfp_restore:
-   @ Restore FPEXC_EN which we clobbered on entry
-   pop {r2}
+   @ If fp/simd not dirty, restore FPEXC which we clobbered on entry.
+   @ Otherwise return with guest FPEXC, later saved in vcpu_put.
+   vfp_inlazy_mode r2, skip_restore_host_fpexc

[PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch

2015-10-30 Thread Mario Smarduch
This patch enables arm64 lazy fp/simd switch, similar to arm described in
second patch. Change from previous version - restore function is moved to
host. 

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 +-
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S  | 37 +++--
 3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 26a2347..dcecf92 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
-static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
+void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8..c9c5242 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -124,6 +124,7 @@ int main(void)
   DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(VCPU_MDCR_EL2,offsetof(struct kvm_vcpu, arch.mdcr_el2));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_DIRTY,   offsetof(struct kvm_vcpu, arch.vfp_dirty));
   DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
   DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
arch.host_debug_state));
   DEFINE(VCPU_TIMER_CNTV_CTL,  offsetof(struct kvm_vcpu, 
arch.timer_cpu.cntv_ctl));
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index e583613..ed2c4cf 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -36,6 +36,28 @@
 #define CPU_SYSREG_OFFSET(x)   (CPU_SYSREGS + 8*x)
 
.text
+
+/**
+ * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
+ * fp/simd switch, saves the guest, restores host. Called from host
+ * mode, placed outside of hyp section.
+ */
+ENTRY(kvm_restore_host_vfp_state)
+   pushxzr, lr
+
+   add x2, x0, #VCPU_CONTEXT
+   mov w3, #0
+   strbw3, [x0, #VCPU_VFP_DIRTY]
+
+   bl __save_fpsimd
+
+   ldr x2, [x0, #VCPU_HOST_CONTEXT]
+   bl __restore_fpsimd
+
+   pop xzr, lr
+   ret
+ENDPROC(kvm_restore_host_vfp_state)
+
.pushsection.hyp.text, "ax"
.align  PAGE_SHIFT
 
@@ -482,7 +504,11 @@
 99:
msr hcr_el2, x2
mov x2, #CPTR_EL2_TTA
+
+   ldrbw3, [x0, #VCPU_VFP_DIRTY]
+   tbnzw3, #0, 98f
orr x2, x2, #CPTR_EL2_TFP
+98:
msr cptr_el2, x2
 
mov x2, #(1 << 15)  // Trap CP15 Cr=15
@@ -669,14 +695,12 @@ __restore_debug:
ret
 
 __save_fpsimd:
-   skip_fpsimd_state x3, 1f
save_fpsimd
-1: ret
+   ret
 
 __restore_fpsimd:
-   skip_fpsimd_state x3, 1f
restore_fpsimd
-1: ret
+   ret
 
 switch_to_guest_fpsimd:
pushx4, lr
@@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
 
mrs x0, tpidr_el2
 
+   mov w2, #1
+   strbw2, [x0, #VCPU_VFP_DIRTY]
+
ldr x2, [x0, #VCPU_HOST_CONTEXT]
kern_hyp_va x2
bl __save_fpsimd
@@ -763,7 +790,6 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
 
save_guest_regs
-   bl __save_fpsimd
bl __save_sysregs
 
skip_debug_state x3, 1f
@@ -784,7 +810,6 @@ __kvm_vcpu_return:
kern_hyp_va x2
 
bl __restore_sysregs
-   bl __restore_fpsimd
/* Clear FPSIMD and Trace trapping */
msr cptr_el2, xzr
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] KVM/arm: enable enhanced armv7 fp/simd lazy switch

2015-10-20 Thread Mario Smarduch


On 10/20/2015 12:24 AM, Christoffer Dall wrote:
> On Mon, Oct 19, 2015 at 04:25:04PM -0700, Mario Smarduch wrote:
>>
>>
>> On 10/19/2015 3:14 AM, Christoffer Dall wrote:
>>> On Sat, Sep 26, 2015 at 04:43:29PM -0700, Mario Smarduch wrote:
>>>> This patch enhances current lazy vfp/simd hardware switch. In addition to
>>>> current lazy switch, it tracks vfp/simd hardware state with a vcpu 
>>>> lazy flag. 
>>>>
>>>> vcpu lazy flag is set on guest access and trap to vfp/simd hardware switch 
>>>> handler. On vm-enter if lazy flag is set skip trap enable and saving 
>>>> host fpexc. On vm-exit if flag is set skip hardware context switch
>>>> and return to host with guest context.
>>>>
>>>> On vcpu_put check if vcpu lazy flag is set, and execute a hardware context 
>>>> switch to restore host.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_asm.h |  1 +
>>>>  arch/arm/kvm/arm.c | 17 
>>>>  arch/arm/kvm/interrupts.S  | 60 
>>>> +++---
>>>>  arch/arm/kvm/interrupts_head.S | 12 ++---
>>>>  4 files changed, 71 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_asm.h 
>>>> b/arch/arm/include/asm/kvm_asm.h
>>>> index 194c91b..4b45d79 100644
>>>> --- a/arch/arm/include/asm/kvm_asm.h
>>>> +++ b/arch/arm/include/asm/kvm_asm.h
>>>> @@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
>>>>  extern void __kvm_flush_vm_context(void);
>>>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>>> +extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>>>  
>>>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>>>  #endif
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index ce404a5..79f49c7 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -105,6 +105,20 @@ void kvm_arch_check_processor_compat(void *rtn)
>>>>*(int *)rtn = 0;
>>>>  }
>>>>  
>>>> +/**
>>>> + * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
>>>> + * @vcpu: pointer to vcpu structure.
>>>> + *
>>>
>>> nit: stray blank line
>> ok
>>>
>>>> + */
>>>> +static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +#ifdef CONFIG_ARM
>>>> +  if (vcpu->arch.vfp_lazy == 1) {
>>>> +  kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
>>>
>>> why do you have to do this in HYP mode ?
>>  Calling it directly works fine. I moved the function outside hyp start/end
>> range in interrupts.S. Not thinking outside the box, just thought let them 
>> all
>> be hyp calls.
>>
>>>
>>>> +  vcpu->arch.vfp_lazy = 0;
>>>> +  }
>>>> +#endif
>>>
>>> we've tried to put stuff like this in header files to avoid the ifdefs
>>> so far.  Could that be done here as well?
>>
>> That was a to do, but didn't get around to it.
>>>
>>>> +}
>>>>  
>>>>  /**
>>>>   * kvm_arch_init_vm - initializes a VM data structure
>>>> @@ -295,6 +309,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>>  
>>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>>  {
>>>> +  /* Check if Guest accessed VFP registers */
>>>
>>> misleading comment: this function does more than checking
>> Yep sure does, will change.
>>>
>>>> +  kvm_switch_fp_regs(vcpu);
>>>> +
>>>>/*
>>>> * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>>> * if the vcpu is no longer assigned to a cpu.  This is used for the
>>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>>> index 900ef6d..6d98232 100644
>>>> --- a/arch/arm/kvm/interrupts.S
>>>> +++ b/arch/arm/kvm/interrupts.S
>>>> @@ -96,6 +96,29 @@ ENTRY(__kvm_flush_vm_context)
>>>>bx  lr
>>>>  ENDPROC(__kvm_flush_vm_context)
>>>>  
>>>> +/**
>>>> + * void __kvm_re

Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-19 Thread Mario Smarduch


On 10/18/2015 2:07 PM, Christoffer Dall wrote:
> On Mon, Oct 12, 2015 at 09:29:23AM -0700, Mario Smarduch wrote:
>> Hi Christoffer, Marc -
>>   I just threw this test your way without any explanation.
> 
> I'm confused.  Did you send me something somewhere already?
Yes in the last patchset

https://lists.cs.columbia.edu/pipermail/kvmarm/2015-October/016698.html

I included a simple test I put together.

> 
>>
>> The test loops, does fp arithmetic and checks the truncated result.
>> It could be a little more dynamic have an initial run to
>> get the sum to compare against while looping, different fp
>> hardware may come up with a different sum, but truncation is
>> to 5'th decimal point.
>>
>> The rationale is that if there is any fp/simd corruption
>> one of these runs should fail. I think most likely scenario
>> for that is a world switch in midst of fp operation. I've
>> instrumented (basically add some tracing to vcpu_put()) and
>> validated vcpu_put gets called thousands of time (for v7,v8)
>> for an over night test running two guests/host crunching
>> fp operations.
>>
>> Other then that not sure how to really catch any problems
>> with the patches applied. Obviously this is a huge issues, if this has
>> any problems. If you or Marc have any other ideas I'd be happy
>> to enhance the test.
> 
> I think it's important to run two VMs at the same time, each with some
> floating-point work, and then run some floating point on the host at the
> same time.
> 
> You can make that even more interesting by doing 32-bit guests at the
> same time as well.

Yes that's the test combination I've been running.
> 
> I believe Marc was running Panranoia
> (http://www.netlib.org/paranoia/paranoia.c) to test the last lazy
> series.

I'll try this test and run it for several days, see if anything shows up.

Thanks.
> 
> Thanks,
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] KVM/arm: add hooks for armv7 fp/simd lazy switch support

2015-10-19 Thread Mario Smarduch


On 10/19/2015 1:53 AM, Christoffer Dall wrote:
> On Sat, Sep 26, 2015 at 04:43:28PM -0700, Mario Smarduch wrote:
>> This patch adds vcpu fields to track lazy state, save host FPEXC, and 
>> offsets to fields.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h | 6 ++
>>  arch/arm/kernel/asm-offsets.c   | 2 ++
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index dcba0fa..194a8ef 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -111,6 +111,12 @@ struct kvm_vcpu_arch {
>>  /* Interrupt related fields */
>>  u32 irq_lines;  /* IRQ and FIQ levels */
>>  
>> +/* Track fp/simd lazy switch state */
>> +u32 vfp_lazy;
> 
> so is this a flags field or basically a boolean?  If the latter, what is
> does it mean when the field is true vs. false?
Yes it's a bool will update, and clarify comments.
> 
>> +
>> +/* Save host FPEXC register to restore on vcpu put */
>> +u32 saved_fpexc;
> 
> is this only the host's state?  If so, why not name it host_fpexc?
That's right itis host state, will change.
> 
> Thanks,
> -Christoffer
> 
>> +
>>  /* Exception Information */
>>  struct kvm_vcpu_fault_info fault;
>>  
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 871b826..e1c3a41 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -186,6 +186,8 @@ int main(void)
>>DEFINE(VCPU_CPSR, offsetof(struct kvm_vcpu, 
>> arch.regs.usr_regs.ARM_cpsr));
>>DEFINE(VCPU_HCR,  offsetof(struct kvm_vcpu, arch.hcr));
>>DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_VFP_LAZY, offsetof(struct kvm_vcpu, 
>> arch.vfp_lazy));
>> +  DEFINE(VCPU_VFP_FPEXC,offsetof(struct kvm_vcpu, arch.saved_fpexc));
>>DEFINE(VCPU_HSR,  offsetof(struct kvm_vcpu, arch.fault.hsr));
>>DEFINE(VCPU_HxFAR,offsetof(struct kvm_vcpu, 
>> arch.fault.hxfar));
>>DEFINE(VCPU_HPFAR,offsetof(struct kvm_vcpu, 
>> arch.fault.hpfar));
>> -- 
>> 1.9.1
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] KVM/arm: enable enhanced armv7 fp/simd lazy switch

2015-10-19 Thread Mario Smarduch


On 10/19/2015 3:14 AM, Christoffer Dall wrote:
> On Sat, Sep 26, 2015 at 04:43:29PM -0700, Mario Smarduch wrote:
>> This patch enhances current lazy vfp/simd hardware switch. In addition to
>> current lazy switch, it tracks vfp/simd hardware state with a vcpu 
>> lazy flag. 
>>
>> vcpu lazy flag is set on guest access and trap to vfp/simd hardware switch 
>> handler. On vm-enter if lazy flag is set skip trap enable and saving 
>> host fpexc. On vm-exit if flag is set skip hardware context switch
>> and return to host with guest context.
>>
>> On vcpu_put check if vcpu lazy flag is set, and execute a hardware context 
>> switch to restore host.
>>
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_asm.h |  1 +
>>  arch/arm/kvm/arm.c | 17 
>>  arch/arm/kvm/interrupts.S  | 60 
>> +++---
>>  arch/arm/kvm/interrupts_head.S | 12 ++---
>>  4 files changed, 71 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index 194c91b..4b45d79 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>> +extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  #endif
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index ce404a5..79f49c7 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -105,6 +105,20 @@ void kvm_arch_check_processor_compat(void *rtn)
>>  *(int *)rtn = 0;
>>  }
>>  
>> +/**
>> + * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
>> + * @vcpu:   pointer to vcpu structure.
>> + *
> 
> nit: stray blank line
ok
> 
>> + */
>> +static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
>> +{
>> +#ifdef CONFIG_ARM
>> +if (vcpu->arch.vfp_lazy == 1) {
>> +kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
> 
> why do you have to do this in HYP mode ?
 Calling it directly works fine. I moved the function outside hyp start/end
range in interrupts.S. Not thinking outside the box, just thought let them all
be hyp calls.

> 
>> +vcpu->arch.vfp_lazy = 0;
>> +}
>> +#endif
> 
> we've tried to put stuff like this in header files to avoid the ifdefs
> so far.  Could that be done here as well?

That was a to do, but didn't get around to it.
> 
>> +}
>>  
>>  /**
>>   * kvm_arch_init_vm - initializes a VM data structure
>> @@ -295,6 +309,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>> +/* Check if Guest accessed VFP registers */
> 
> misleading comment: this function does more than checking
Yep sure does, will change.
> 
>> +kvm_switch_fp_regs(vcpu);
>> +
>>  /*
>>   * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>   * if the vcpu is no longer assigned to a cpu.  This is used for the
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 900ef6d..6d98232 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -96,6 +96,29 @@ ENTRY(__kvm_flush_vm_context)
>>  bx  lr
>>  ENDPROC(__kvm_flush_vm_context)
>>  
>> +/**
>> + * void __kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes a lazy
>> + * fp/simd switch, saves the guest, restores host.
>> + *
> 
> nit: stray blank line
ok.
> 
>> + */
>> +ENTRY(__kvm_restore_host_vfp_state)
>> +#ifdef CONFIG_VFPv3
>> +push{r3-r7}
>> +
>> +add r7, r0, #VCPU_VFP_GUEST
>> +store_vfp_state r7
>> +
>> +add r7, r0, #VCPU_VFP_HOST
>> +ldr r7, [r7]
>> +restore_vfp_state r7
>> +
>> +ldr r3, [vcpu, #VCPU_VFP_FPEXC]
> 
> either use r0 or vcpu throughout this function please
Yeah that's bad - in the same function to
> 
>> +VFPFMXR FPEXC, r3
>> +
>> +pop {r3-r7}
>> +#endif
>> +bx  lr
>> +ENDPROC(__kvm_restore_host_vfp_state)
>>  
>>  /
>>   *  Hy

[PATCH] KVM/arm: kernel low level debug suport for ARM32 virtual platforms

2015-10-16 Thread Mario Smarduch
When booting a VM using QEMU or Kvmtool there are no clear ways to 
enable low level debugging for these virtual platforms. some menu port 
choices are not supported by the virtual platforms at all. And there is no
help on the location of physical and virtual addresses for the ports.
This may lead to wrong debug port and a frozen VM with a blank screen.

This patch adds menu selections for QEMU and Kvmtool virtual platforms for low 
level kernel print debugging. Help section displays port physical and
virutal addresses.

ARM reference models use the MIDR register to run-time select UART port address 
(for ARCH_VEXPRESS) based on A9 or A15 part numbers. Looked for a same approach
but couldn't find a way to differentiate between virtual platforms, something
like a platform register.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/Kconfig.debug | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index a2e16f9..d126bd4 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -1155,6 +1155,28 @@ choice
  This option selects UART0 on VIA/Wondermedia System-on-a-chip
  devices, including VT8500, WM8505, WM8650 and WM8850.
 
+   config DEBUG_VIRT_UART_QEMU
+   bool "Kernel low-level debugging on QEMU Virtual Platform"
+   depends on ARCH_VIRT
+   select DEBUG_UART_PL01X
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to PL011 UART port on QEMU Virtual Platform.
+ Appropriate address values are:
+   PHYSVIRT
+   0x900   0xf809
+
+   config DEBUG_VIRT_UART_KVMTOOL
+   bool "Kernel low-level debugging on Kvmtool Virtual Platform"
+   depends on ARCH_VIRT
+   select DEBUG_UART_8250
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to 8250 UART port on Kvmtool Virtual
+ Platform. Appropriate address values are:
+   PHYSVIRT
+   0x3f8   0xf80903f8
+
config DEBUG_ICEDCC
bool "Kernel low-level debugging via EmbeddedICE DCC channel"
help
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-12 Thread Mario Smarduch
Hi Christoffer, Marc -
  I just threw this test your way without any explanation.

The test loops, does fp arithmetic and checks the truncated result.
It could be a little more dynamic have an initial run to
get the sum to compare against while looping, different fp
hardware may come up with a different sum, but truncation is
to 5'th decimal point.

The rationale is that if there is any fp/simd corruption
one of these runs should fail. I think most likely scenario
for that is a world switch in midst of fp operation. I've
instrumented (basically add some tracing to vcpu_put()) and
validated vcpu_put gets called thousands of time (for v7,v8)
for an over night test running two guests/host crunching
fp operations.

Other then that not sure how to really catch any problems
with the patches applied. Obviously this is a huge issues, if this has
any problems. If you or Marc have any other ideas I'd be happy
to enhance the test.

Thanks,
  Mario

On 10/5/2015 8:45 AM, Christoffer Dall wrote:
> On Tue, Sep 22, 2015 at 04:34:01PM -0700, Mario Smarduch wrote:
>> This is a 2nd itteration for arm64, v1 patches were posted by mistake from 
>> an 
>> older branch which included several bugs. Hopefully didn't waste too much of 
>> anyones time.
>>
>> This patch series is a followup to the armv7 fp/simd lazy switch
>> implementation, uses similar approach and depends on the series - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016516.html
>>
>> It's based on earlier arm64 fp/simd optimization work - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html
>>
>> And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
>> 32-bit guest on 64 bit host (and may require more here) - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html
>>
>> This series has be tested with arm64 on arm64 with several FP applications 
>> running on host and guest, with substantial decrease on number of 
>> fp/simd context switches. From about 30% down to 2% with one guest running.
>>
>> At this time I don't have arm32/arm64 working and hoping Christoffer and/or
>> Marc (or anyone) can test 32-bit guest/64-bit host.
>>
> Did you already have some test infrastructure/applications that I can
> reuse for this purpose or do I have to write userspace software?
> 
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-05 Thread Mario Smarduch
Will do, I'll get them over to you.

-Original Message-
From: Christoffer Dall [mailto:christoffer.d...@linaro.org] 
Sent: Monday, October 05, 2015 10:26 AM
To: Mario Smarduch
Cc: kvm...@lists.cs.columbia.edu; marc.zyng...@arm.com; kvm@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org
Subject: Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

On Mon, Oct 05, 2015 at 09:14:57AM -0700, Mario Smarduch wrote:
> Hi Christoffer,
>I just managed to boot qemu arm32 up on arm64 (last Fri - thanks 
> for the tip
> - there were few other issue to clean up), so let me retest it again. 
> Also I noticed some refactoring would help both 32 and 64 bit patches.
> 
> Yes I could provide a the user space tests as well.
> 
I'd like those regardless as I generally test my queue before pushing it to 
next.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-05 Thread Mario Smarduch
This patch series is a followup to the armv7 fp/simd lazy switch
implementation, uses similar approach and depends on the series -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016567.html
Patches are based on 4.3-rc2 commit 1f93e4a96c91093

Patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

The patch series have been tested on Foundation Model arm64/arm64 and
arm32/arm64. The test program used can be found here 

https://github.com/mjsmar/arm-arm64-fpsimd-test

Launched upto 16 instances on 4-way Guest and another 16 on the host (both 
cases 1mS sleep), ran overnight. 

Changes v1->v2:
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit 

Mario Smarduch (2):
  add hooks for armv8 fp/simd lazy switch
  enable armv8 fp/simd lazy switch

 arch/arm/kvm/arm.c|  2 --
 arch/arm64/include/asm/kvm_asm.h  |  1 +
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S  | 59 ++-
 5 files changed, 45 insertions(+), 21 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] add hooks for armv8 fp/simd lazy switch

2015-10-05 Thread Mario Smarduch
This patch adds hooks to support fp/simd lazy switch. A vcpu flag to track
fp/simd state, and flag offset in vcpu structure.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 +++
 arch/arm64/kernel/asm-offsets.c   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 4562459..03f25d0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -157,6 +157,9 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u64 irq_lines;  /* IRQ and FIQ levels */
 
+   /* Track fp/simd lazy switch */
+   u32 vfp_lazy;
+
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8..8311da4 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -124,6 +124,7 @@ int main(void)
   DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(VCPU_MDCR_EL2,offsetof(struct kvm_vcpu, arch.mdcr_el2));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_LAZY, offsetof(struct kvm_vcpu, arch.vfp_lazy));
   DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
   DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
arch.host_debug_state));
   DEFINE(VCPU_TIMER_CNTV_CTL,  offsetof(struct kvm_vcpu, 
arch.timer_cpu.cntv_ctl));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] enable armv8 fp/simd lazy switch

2015-10-05 Thread Mario Smarduch
This patch enables arm64 lazy fp/simd switch. Removes the ARM constraint,
and follows the same approach as armv7 version - found here.

https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016567.html

To summarize - provided the guest accesses fp/simd unit we limit number
of fp/simd context switches to two per vCPU execution schedule.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/kvm/arm.c   |  2 --
 arch/arm64/include/asm/kvm_asm.h |  1 +
 arch/arm64/kvm/hyp.S | 59 +++-
 3 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 1b1f9e9..fe609f1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -112,12 +112,10 @@ void kvm_arch_check_processor_compat(void *rtn)
  */
 static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
 {
-#ifdef CONFIG_ARM
if (vcpu->arch.vfp_lazy == 1) {
kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
vcpu->arch.vfp_lazy = 0;
}
-#endif
 }
 
 /**
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e37710..83dcac5 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -117,6 +117,7 @@ extern char __kvm_hyp_vector[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index e583613..ea99f66 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -385,14 +385,6 @@
tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
-/*
- * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
- */
-.macro skip_fpsimd_state tmp, target
-   mrs \tmp, cptr_el2
-   tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
-.endm
-
 .macro compute_debug_state target
// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
// is set, we do a full save/restore cycle and disable trapping.
@@ -433,10 +425,6 @@
mrs x5, ifsr32_el2
stp x4, x5, [x3]
 
-   skip_fpsimd_state x8, 2f
-   mrs x6, fpexc32_el2
-   str x6, [x3, #16]
-2:
skip_debug_state x8, 1f
mrs x7, dbgvcr32_el2
str x7, [x3, #24]
@@ -481,8 +469,15 @@
isb
 99:
msr hcr_el2, x2
-   mov x2, #CPTR_EL2_TTA
+
+   mov x2, #0
+   ldr w3, [x0, #VCPU_VFP_LAZY]
+   tbnzw3, #0, 98f
+
orr x2, x2, #CPTR_EL2_TFP
+98:
+   orr x2, x2, #CPTR_EL2_TTA
+
msr cptr_el2, x2
 
mov x2, #(1 << 15)  // Trap CP15 Cr=15
@@ -669,14 +664,12 @@ __restore_debug:
ret
 
 __save_fpsimd:
-   skip_fpsimd_state x3, 1f
save_fpsimd
-1: ret
+   ret
 
 __restore_fpsimd:
-   skip_fpsimd_state x3, 1f
restore_fpsimd
-1: ret
+   ret
 
 switch_to_guest_fpsimd:
pushx4, lr
@@ -688,6 +681,9 @@ switch_to_guest_fpsimd:
 
mrs x0, tpidr_el2
 
+   mov w2, #1
+   str w2, [x0, #VCPU_VFP_LAZY]
+
ldr x2, [x0, #VCPU_HOST_CONTEXT]
kern_hyp_va x2
bl __save_fpsimd
@@ -763,7 +759,6 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
 
save_guest_regs
-   bl __save_fpsimd
bl __save_sysregs
 
skip_debug_state x3, 1f
@@ -784,7 +779,6 @@ __kvm_vcpu_return:
kern_hyp_va x2
 
bl __restore_sysregs
-   bl __restore_fpsimd
/* Clear FPSIMD and Trace trapping */
msr cptr_el2, xzr
 
@@ -863,6 +857,33 @@ ENTRY(__kvm_flush_vm_context)
ret
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+ * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
+ * @vcpu:  pointer to vcpu structure.
+ *
+ */
+ENTRY(__kvm_restore_host_vfp_state)
+   pushx4, lr
+
+   kern_hyp_va x0
+   add x2, x0, #VCPU_CONTEXT
+
+   // Load Guest HCR, determine if guest is 32 or 64 bit
+   ldr x3, [x0, #VCPU_HCR_EL2]
+   tbnzx3, #HCR_RW_SHIFT, 1f
+   mrs x4, fpexc32_el2
+   str x4, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
+1:
+   bl __save_fpsimd
+
+   ldr x2, [x0, #VCPU_HOST_CONTEXT]
+   kern_hyp_va x2
+   bl __restore_fpsimd
+
+   pop x4, lr
+   ret
+ENDPROC(__kvm_restore_host_vfp_state)
+
 __kvm_hyp_panic:
// Guess the context by looking at VTTBR:
// If zero, then we're already a host.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-05 Thread Mario Smarduch
Hi Christoffer,
   I just managed to boot qemu arm32 up on arm64 (last Fri - thanks for the tip
- there were few other issue to clean up), so let me retest it again. Also I
noticed some refactoring would help both 32 and 64 bit patches.

Yes I could provide a the user space tests as well.

Thanks-
- Mario

On 10/5/2015 8:45 AM, Christoffer Dall wrote:
> On Tue, Sep 22, 2015 at 04:34:01PM -0700, Mario Smarduch wrote:
>> This is a 2nd itteration for arm64, v1 patches were posted by mistake from 
>> an 
>> older branch which included several bugs. Hopefully didn't waste too much of 
>> anyones time.
>>
>> This patch series is a followup to the armv7 fp/simd lazy switch
>> implementation, uses similar approach and depends on the series - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016516.html
>>
>> It's based on earlier arm64 fp/simd optimization work - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html
>>
>> And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
>> 32-bit guest on 64 bit host (and may require more here) - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html
>>
>> This series has be tested with arm64 on arm64 with several FP applications 
>> running on host and guest, with substantial decrease on number of 
>> fp/simd context switches. From about 30% down to 2% with one guest running.
>>
>> At this time I don't have arm32/arm64 working and hoping Christoffer and/or
>> Marc (or anyone) can test 32-bit guest/64-bit host.
>>
> Did you already have some test infrastructure/applications that I can
> reuse for this purpose or do I have to write userspace software?
> 
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] KVM/arm: enhance arvm7 vfp/simd lazy switch support

2015-09-26 Thread Mario Smarduch
Current lazy vfp/simd implementation switches hardware context only on 
guest access and again on exit to host, otherwise hardware context is
skipped.

This patch set builds on that functionality and executes a hardware context 
switch only when  vCPU is scheduled out or returns to user space.

Patches were tested on FVP sw platform. FP crunching applications summing up 
values, with outcome compared to known result were executed on several guests, 
and host. 

Changes since v1->v2:
* Fixed vfp/simd trap configuration to enable trace trapping
* Removed set_hcptr branch label
* Fixed handling of FPEXC to restore guest and host versions on vcpu_put

Mario Smarduch (2):
  add hooks for armv7 fp/simd lazy switch support
  enable armv7 fp/simd lazy switch

 arch/arm/include/asm/kvm_asm.h  |  1 +
 arch/arm/include/asm/kvm_host.h |  6 +
 arch/arm/kernel/asm-offsets.c   |  2 ++
 arch/arm/kvm/arm.c  | 17 
 arch/arm/kvm/interrupts.S   | 60 ++---
 arch/arm/kvm/interrupts_head.S  | 12 ++---
 6 files changed, 79 insertions(+), 19 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] KVM/arm: add hooks for armv7 fp/simd lazy switch support

2015-09-26 Thread Mario Smarduch
This patch adds vcpu fields to track lazy state, save host FPEXC, and 
offsets to fields.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_host.h | 6 ++
 arch/arm/kernel/asm-offsets.c   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..194a8ef 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -111,6 +111,12 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
 
+   /* Track fp/simd lazy switch state */
+   u32 vfp_lazy;
+
+   /* Save host FPEXC register to restore on vcpu put */
+   u32 saved_fpexc;
+
/* Exception Information */
struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..e1c3a41 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -186,6 +186,8 @@ int main(void)
   DEFINE(VCPU_CPSR,offsetof(struct kvm_vcpu, 
arch.regs.usr_regs.ARM_cpsr));
   DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_LAZY,offsetof(struct kvm_vcpu, 
arch.vfp_lazy));
+  DEFINE(VCPU_VFP_FPEXC,   offsetof(struct kvm_vcpu, arch.saved_fpexc));
   DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr));
   DEFINE(VCPU_HxFAR,   offsetof(struct kvm_vcpu, arch.fault.hxfar));
   DEFINE(VCPU_HPFAR,   offsetof(struct kvm_vcpu, arch.fault.hpfar));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] KVM/arm: enable enhanced armv7 fp/simd lazy switch

2015-09-26 Thread Mario Smarduch
This patch enhances current lazy vfp/simd hardware switch. In addition to
current lazy switch, it tracks vfp/simd hardware state with a vcpu 
lazy flag. 

vcpu lazy flag is set on guest access and trap to vfp/simd hardware switch 
handler. On vm-enter if lazy flag is set skip trap enable and saving 
host fpexc. On vm-exit if flag is set skip hardware context switch
and return to host with guest context.

On vcpu_put check if vcpu lazy flag is set, and execute a hardware context 
switch to restore host.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_asm.h |  1 +
 arch/arm/kvm/arm.c | 17 
 arch/arm/kvm/interrupts.S  | 60 +++---
 arch/arm/kvm/interrupts_head.S | 12 ++---
 4 files changed, 71 insertions(+), 19 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 194c91b..4b45d79 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce404a5..79f49c7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -105,6 +105,20 @@ void kvm_arch_check_processor_compat(void *rtn)
*(int *)rtn = 0;
 }
 
+/**
+ * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
+ * @vcpu:  pointer to vcpu structure.
+ *
+ */
+static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_ARM
+   if (vcpu->arch.vfp_lazy == 1) {
+   kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
+   vcpu->arch.vfp_lazy = 0;
+   }
+#endif
+}
 
 /**
  * kvm_arch_init_vm - initializes a VM data structure
@@ -295,6 +309,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   /* Check if Guest accessed VFP registers */
+   kvm_switch_fp_regs(vcpu);
+
/*
 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 * if the vcpu is no longer assigned to a cpu.  This is used for the
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..6d98232 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -96,6 +96,29 @@ ENTRY(__kvm_flush_vm_context)
bx  lr
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+ * void __kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes a lazy
+ * fp/simd switch, saves the guest, restores host.
+ *
+ */
+ENTRY(__kvm_restore_host_vfp_state)
+#ifdef CONFIG_VFPv3
+   push{r3-r7}
+
+   add r7, r0, #VCPU_VFP_GUEST
+   store_vfp_state r7
+
+   add r7, r0, #VCPU_VFP_HOST
+   ldr r7, [r7]
+   restore_vfp_state r7
+
+   ldr r3, [vcpu, #VCPU_VFP_FPEXC]
+   VFPFMXR FPEXC, r3
+
+   pop {r3-r7}
+#endif
+   bx  lr
+ENDPROC(__kvm_restore_host_vfp_state)
 
 /
  *  Hypervisor world-switch code
@@ -119,11 +142,15 @@ ENTRY(__kvm_vcpu_run)
@ If the host kernel has not been configured with VFPv3 support,
@ then it is safer if we deny guests from using it as well.
 #ifdef CONFIG_VFPv3
+   @ r7 must be preserved until next vfp lazy check
+   vfp_inlazy_mode r7, skip_save_host_fpexc
+
@ Set FPEXC_EN so the guest doesn't trap floating point instructions
VFPFMRX r2, FPEXC   @ VMRS
-   push{r2}
+   str r2, [vcpu, #VCPU_VFP_FPEXC]
orr r2, r2, #FPEXC_EN
VFPFMXR FPEXC, r2   @ VMSR
+skip_save_host_fpexc:
 #endif
 
@ Configure Hyp-role
@@ -131,7 +158,14 @@ ENTRY(__kvm_vcpu_run)
 
@ Trap coprocessor CRx accesses
set_hstr vmentry
-   set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+   set_hcptr vmentry, (HCPTR_TTA)
+
+   @ check if vfp_lazy flag set
+   cmp r7, #1
+   beq skip_guest_vfp_trap
+   set_hcptr vmentry, (HCPTR_TCP(10) | HCPTR_TCP(11))
+skip_guest_vfp_trap:
+
set_hdcr vmentry
 
@ Write configured ID register into MIDR alias
@@ -170,22 +204,14 @@ __kvm_vcpu_return:
@ Don't trap coprocessor accesses for host kernel
set_hstr vmexit
set_hdcr vmexit
-   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), 
after_vfp_restore
+   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 
 #ifdef CONFIG_VFPv3
-   @ Switch VFP/NEON hardware state to the host's
-   add r7, vcpu, #VCPU_VFP_GUEST
-   store_vfp_state r7
-   add r7, vcpu, #VCPU_VFP_HOST
-   ldr

Re: [PATCH 2/2] KVM/arm: enable armv7 fp/simd lazy switch

2015-09-22 Thread Mario Smarduch
Hi Antonios,

On 9/22/2015 7:01 AM, Antonios Motakis wrote:
> Hello,
> 
> On 18-Sep-15 03:05, Mario Smarduch wrote:
>> Adds code to enable fp/simd lazy switch. On each entry check if fp/simd
>> registers have been switched to guest, if no set the trap flag. On trap 
>> switch fp/simd registers and set vfp_lazy to true and disable trapping. 
>> When the vcpu is about to be put, then context switch fp/simd registers
>> save guest and restore host and reset the vfp_lazy state to enable trapping 
>> again.
>>
> 
> This description confused me a bit, since KVM on ARMv7 already exhibits
> lazy switching behavior for VFP. Should the description highlight the
> intended improvement in behavior?
> 
> If I understand correctly, instead of restoring the host state on every
> exit, you postpone it until the task actually gets rescheduled, right?

Yes that's right, I'll reword to highlight the changes.
> 
>> Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
>> ---
>>  arch/arm/kvm/arm.c| 17 +
>>  arch/arm/kvm/interrupts.S | 40 +---
>>  2 files changed, 46 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index ce404a5..0acbb69 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -105,6 +105,20 @@ void kvm_arch_check_processor_compat(void *rtn)
>>  *(int *)rtn = 0;
>>  }
>>  
>> +/**
>> + * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
>> + * @vcpu:  pointer to vcpu structure.
>> + *
>> + */
>> +static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
>> +{
>> +#ifdef CONFIG_ARM
>> +if (vcpu->arch.vfp_lazy == 1) {
>> +kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
>> +vcpu->arch.vfp_lazy = 0;
>> +}
>> +#endif
>> +}
>>  
>>  /**
>>   * kvm_arch_init_vm - initializes a VM data structure
>> @@ -295,6 +309,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>> +/* Check if Guest accessed VFP registers */
>> +kvm_switch_fp_regs(vcpu);
>> +
>>  /*
>>   * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>   * if the vcpu is no longer assigned to a cpu.  This is used for the
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 900ef6d..a47acc1 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -96,6 +96,24 @@ ENTRY(__kvm_flush_vm_context)
>>  bx  lr
>>  ENDPROC(__kvm_flush_vm_context)
>>  
>> +/**
>> + * void __kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes a lazy
>> + *  fp/simd switch, saves the guest, restores host.
>> + *
>> + */
>> +ENTRY(__kvm_restore_host_vfp_state)
>> +push{r3-r7}
>> +
>> +add r7, r0, #VCPU_VFP_GUEST
>> +store_vfp_state r7
>> +
>> +add r7, r0, #VCPU_VFP_HOST
>> +ldr r7, [r7]
>> +restore_vfp_state r7
>> +
>> +pop {r3-r7}
>> +bx  lr
>> +ENDPROC(__kvm_restore_host_vfp_state)
>>  
>>  /
>>   *  Hypervisor world-switch code
>> @@ -131,7 +149,14 @@ ENTRY(__kvm_vcpu_run)
>>  
>>  @ Trap coprocessor CRx accesses
>>  set_hstr vmentry
>> +
>> +ldr r1, [vcpu, #VCPU_VFP_LAZY]
>> +cmp r1, #1
>> +beq skip_guest_vfp_trap
>> +
>>  set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>> +skip_guest_vfp_trap:
> 
> I believe that HCPTR_TTA is not part of the floating point extensions.

Yes you're right trap on tracing is not enabled.

> 
>> +
>>  set_hdcr vmentry
>>  
>>  @ Write configured ID register into MIDR alias
>> @@ -170,22 +195,12 @@ __kvm_vcpu_return:
>>  @ Don't trap coprocessor accesses for host kernel
>>  set_hstr vmexit
>>  set_hdcr vmexit
>> -set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), 
>> after_vfp_restore
>> +set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> 
> If you don't use the functionality of the macro to branch on change,
> then maybe the functionality should also be removed from the macro.

Yes, the macros needs to change.

> 
>>  
>>  #ifdef CONFIG_VFPv3
>> -@ Switch VFP/NEON hardware state to the host's
>> -add r7, vcpu, 

[RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-09-22 Thread Mario Smarduch
This is a 2nd itteration for arm64, v1 patches were posted by mistake from an 
older branch which included several bugs. Hopefully didn't waste too much of 
anyones time.

This patch series is a followup to the armv7 fp/simd lazy switch
implementation, uses similar approach and depends on the series - see
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016516.html

It's based on earlier arm64 fp/simd optimization work - see
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host (and may require more here) - see
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

This series has be tested with arm64 on arm64 with several FP applications 
running on host and guest, with substantial decrease on number of 
fp/simd context switches. From about 30% down to 2% with one guest running.

At this time I don't have arm32/arm64 working and hoping Christoffer and/or
Marc (or anyone) can test 32-bit guest/64-bit host.

Mario Smarduch (2):
  add hooks for armv8 fp/simd lazy switch
  enable armv8 fp/simd lazy switch

 arch/arm/kvm/arm.c|  2 --
 arch/arm64/include/asm/kvm_asm.h  |  1 +
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S  | 58 ++-
 5 files changed, 44 insertions(+), 21 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT - PATCH v2 1/2] add hooks for armv8 fp/simd lazy switch

2015-09-22 Thread Mario Smarduch
This patch adds hooks to support fp/simd lazy switch. A vcpu flag to track
fp/simd state, and flag offset in vcpu structure.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 +++
 arch/arm64/kernel/asm-offsets.c   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 415938d..f4665e5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -161,6 +161,9 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u64 irq_lines;  /* IRQ and FIQ levels */
 
+   /* Track fp/simd lazy switch */
+   u32 vfp_lazy;
+
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8..8311da4 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -124,6 +124,7 @@ int main(void)
   DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(VCPU_MDCR_EL2,offsetof(struct kvm_vcpu, arch.mdcr_el2));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_LAZY, offsetof(struct kvm_vcpu, arch.vfp_lazy));
   DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
   DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
arch.host_debug_state));
   DEFINE(VCPU_TIMER_CNTV_CTL,  offsetof(struct kvm_vcpu, 
arch.timer_cpu.cntv_ctl));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT - PATCH v2 2/2] enable armv8 fp/simd lazy switch

2015-09-22 Thread Mario Smarduch
This patch enables arm64 lazy fp/simd switch. Removes the ARM constraint,
and follows the same approach as armv7 version - found here

https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016518.html

To summarize - provided the guest accesses fp/simd unit we limit number of 
fp/simd context switches to one per vCPU scheduled execution.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/kvm/arm.c   |  2 --
 arch/arm64/include/asm/kvm_asm.h |  1 +
 arch/arm64/kvm/hyp.S | 58 +++-
 3 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0acbb69..7260853 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -112,12 +112,10 @@ void kvm_arch_check_processor_compat(void *rtn)
  */
 static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
 {
-#ifdef CONFIG_ARM
if (vcpu->arch.vfp_lazy == 1) {
kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
vcpu->arch.vfp_lazy = 0;
}
-#endif
 }
 
 /**
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 67fa0de..7d9936b 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -119,6 +119,7 @@ extern char __kvm_hyp_vector[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 39aa322..2273916 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -385,14 +385,6 @@
tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
-/*
- * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
- */
-.macro skip_fpsimd_state tmp, target
-   mrs \tmp, cptr_el2
-   tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
-.endm
-
 .macro compute_debug_state target
// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
// is set, we do a full save/restore cycle and disable trapping.
@@ -433,10 +425,6 @@
mrs x5, ifsr32_el2
stp x4, x5, [x3]
 
-   skip_fpsimd_state x8, 3f
-   mrs x6, fpexc32_el2
-   str x6, [x3, #16]
-3:
skip_debug_state x8, 2f
mrs x7, dbgvcr32_el2
str x7, [x3, #24]
@@ -495,8 +483,14 @@
isb
 99:
msr hcr_el2, x2
-   mov x2, #CPTR_EL2_TTA
+
+   mov x2, #0
+   ldr x3, [x0, #VCPU_VFP_LAZY]
+   tbnzx3, #0, 98f
+
orr x2, x2, #CPTR_EL2_TFP
+98:
+   orr x2, x2, #CPTR_EL2_TTA
msr cptr_el2, x2
 
mov x2, #(1 << 15)  // Trap CP15 Cr=15
@@ -674,14 +668,12 @@ __restore_debug:
ret
 
 __save_fpsimd:
-   skip_fpsimd_state x3, 1f
save_fpsimd
-1: ret
+   ret
 
 __restore_fpsimd:
-   skip_fpsimd_state x3, 1f
restore_fpsimd
-1: ret
+   ret
 
 switch_to_guest_fpsimd:
pushx4, lr
@@ -693,6 +685,9 @@ switch_to_guest_fpsimd:
 
mrs x0, tpidr_el2
 
+   mov x2, #1
+   str x2, [x0, #VCPU_VFP_LAZY]
+
ldr x2, [x0, #VCPU_HOST_CONTEXT]
kern_hyp_va x2
bl __save_fpsimd
@@ -768,7 +763,6 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
 
save_guest_regs
-   bl __save_fpsimd
bl __save_sysregs
 
skip_debug_state x3, 1f
@@ -789,7 +783,6 @@ __kvm_vcpu_return:
kern_hyp_va x2
 
bl __restore_sysregs
-   bl __restore_fpsimd
/* Clear FPSIMD and Trace trapping */
msr cptr_el2, xzr
 
@@ -868,6 +861,33 @@ ENTRY(__kvm_flush_vm_context)
ret
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+ * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
+ * @vcpu:  pointer to vcpu structure.
+ *
+ */
+ENTRY(__kvm_restore_host_vfp_state)
+   pushx4, lr
+
+   kern_hyp_va x0
+   add x2, x0, #VCPU_CONTEXT
+
+   // Load Guest HCR, determine if guest is 32 or 64 bit
+   ldr x3, [x0, #VCPU_HCR_EL2]
+   tbnzx3, #HCR_RW_SHIFT, 1f
+   mrs x4, fpexc32_el2
+   str x4, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
+1:
+   bl __save_fpsimd
+
+   ldr x2, [x0, #VCPU_HOST_CONTEXT]
+   kern_hyp_va x2
+   bl __restore_fpsimd
+
+   pop x4, lr
+   ret
+ENDPROC(__kvm_restore_host_vfp_state)
+
 __kvm_hyp_panic:
// Guess the context by looking at VTTBR:
// If zero, then we're already a host.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT - PATCH 2/2] KVM/arm64: enable armv8 fp/simd lazy switch

2015-09-21 Thread Mario Smarduch
This patch enables arm64 lazy fp/simd switch. Removes the ARM constraint,
and follows the same approach as armv7 version - found here

https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016518.html  

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/kvm/arm.c   |  2 --
 arch/arm64/kvm/hyp.S | 59 +++-
 2 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0acbb69..7260853 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -112,12 +112,10 @@ void kvm_arch_check_processor_compat(void *rtn)
  */
 static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
 {
-#ifdef CONFIG_ARM
if (vcpu->arch.vfp_lazy == 1) {
kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
vcpu->arch.vfp_lazy = 0;
}
-#endif
 }
 
 /**
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 39aa322..e412251 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -385,14 +385,6 @@
tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
-/*
- * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
- */
-.macro skip_fpsimd_state tmp, target
-   mrs \tmp, cptr_el2
-   tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
-.endm
-
 .macro compute_debug_state target
// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
// is set, we do a full save/restore cycle and disable trapping.
@@ -433,10 +425,6 @@
mrs x5, ifsr32_el2
stp x4, x5, [x3]
 
-   skip_fpsimd_state x8, 3f
-   mrs x6, fpexc32_el2
-   str x6, [x3, #16]
-3:
skip_debug_state x8, 2f
mrs x7, dbgvcr32_el2
str x7, [x3, #24]
@@ -495,8 +483,14 @@
isb
 99:
msr hcr_el2, x2
-   mov x2, #CPTR_EL2_TTA
+
+   mov x2, #0
+   ldr x3, [x0, #VCPU_VFP_LAZY]
+   tbnzx3, #0, 98f
+
orr x2, x2, #CPTR_EL2_TFP
+98:
+   mov x2, #CPTR_EL2_TTA
msr cptr_el2, x2
 
mov x2, #(1 << 15)  // Trap CP15 Cr=15
@@ -674,14 +668,10 @@ __restore_debug:
ret
 
 __save_fpsimd:
-   skip_fpsimd_state x3, 1f
save_fpsimd
-1: ret
 
 __restore_fpsimd:
-   skip_fpsimd_state x3, 1f
restore_fpsimd
-1: ret
 
 switch_to_guest_fpsimd:
pushx4, lr
@@ -693,6 +683,9 @@ switch_to_guest_fpsimd:
 
mrs x0, tpidr_el2
 
+   mov x2, #1
+   str x2, [x0, #VCPU_VFP_LAZY]
+
ldr x2, [x0, #VCPU_HOST_CONTEXT]
kern_hyp_va x2
bl __save_fpsimd
@@ -768,7 +761,6 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
 
save_guest_regs
-   bl __save_fpsimd
bl __save_sysregs
 
skip_debug_state x3, 1f
@@ -789,7 +781,6 @@ __kvm_vcpu_return:
kern_hyp_va x2
 
bl __restore_sysregs
-   bl __restore_fpsimd
/* Clear FPSIMD and Trace trapping */
msr cptr_el2, xzr
 
@@ -868,6 +859,36 @@ ENTRY(__kvm_flush_vm_context)
ret
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+ * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
+ * @vcpu:  pointer to vcpu structure.
+ *
+ */
+ENTRY(__kvm_restore_host_vfp_state)
+   pushx4, lr
+
+   kern_hyp_va x0
+
+   // Load Guest HCR, determine if guest is 32 or 64 bit
+   ldr x2, [x0, #VCPU_HCR_EL2]
+   msr hcr_el2, x2
+
+   add x2, x0, #VCPU_CONTEXT
+
+   skip_32bit_state x3, 1f
+   mrs x4, fpexc32_el2
+   str x4, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
+1:
+   bl __save_fpsimd
+
+   ldr x2, [x0, #VCPU_HOST_CONTEXT]
+   kern_hyp_va x2
+   bl __restore_fpsimd
+
+   pop x4, lr
+   ret
+ENDPROC(__kvm_restore_host_vfp_state)
+
 __kvm_hyp_panic:
// Guess the context by looking at VTTBR:
// If zero, then we're already a host.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT - PATCH 1/2] KVM/arm64: add hooks for armv8 fp/simd lazy switch support

2015-09-21 Thread Mario Smarduch
This patch adds hooks to support fp/simd lazy switch. A vcpu flag to track
fp/simd state, offset into the vcpu structure and switch prototype function.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm64/include/asm/kvm_asm.h  | 1 +
 arch/arm64/include/asm/kvm_host.h | 3 +++
 arch/arm64/kernel/asm-offsets.c   | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 67fa0de..7d9936b 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -119,6 +119,7 @@ extern char __kvm_hyp_vector[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 415938d..f4665e5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -161,6 +161,9 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u64 irq_lines;  /* IRQ and FIQ levels */
 
+   /* Track fp/simd lazy switch */
+   u32 vfp_lazy;
+
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8..d6edd65 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -124,6 +124,7 @@ int main(void)
   DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(VCPU_MDCR_EL2,offsetof(struct kvm_vcpu, arch.mdcr_el2));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_LAZY,offsetof(struct kvm_vcpu, 
arch.vfp_lazy));
   DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
   DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
arch.host_debug_state));
   DEFINE(VCPU_TIMER_CNTV_CTL,  offsetof(struct kvm_vcpu, 
arch.timer_cpu.cntv_ctl));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT - PATCH 0/2] KVM/arm64: add fp/simd lazy switch support

2015-09-21 Thread Mario Smarduch
This patch series is a followup to the armv7 fp/simd lazy switch 
implementation, uses similar approach and depends on the series - see
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016516.html

It's based on earlier arm64 fp/simd optimization work - see
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - see
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

This series has be tested on arm64/arm64 but not arm32/arm64 which needs
validation (the RFT tag). The results substantially decrease the numbe of
fp/simd context switches for a FP load. 

At this time I don't have arm32/arm64 working and requesting Christoffer and/or 
Marc to test 32 bit guest on 64 bit host.

Mario Smarduch (2):
  add hooks for armv8 fp/simd lazy switch support
  enable armv8 fp/simd lazy switch

 arch/arm/kvm/arm.c|  2 --
 arch/arm64/include/asm/kvm_asm.h  |  1 +
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S  | 59 ++-
 5 files changed, 45 insertions(+), 21 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM/arm: enable armv7 fp/simd lazy switch

2015-09-17 Thread Mario Smarduch
Adds code to enable fp/simd lazy switch. On each entry check if fp/simd
registers have been switched to guest, if no set the trap flag. On trap 
switch fp/simd registers and set vfp_lazy to true and disable trapping. 
When the vcpu is about to be put, then context switch fp/simd registers
save guest and restore host and reset the vfp_lazy state to enable trapping 
again.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/kvm/arm.c| 17 +
 arch/arm/kvm/interrupts.S | 40 +---
 2 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce404a5..0acbb69 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -105,6 +105,20 @@ void kvm_arch_check_processor_compat(void *rtn)
*(int *)rtn = 0;
 }
 
+/**
+ * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
+ * @vcpu:  pointer to vcpu structure.
+ *
+ */
+static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_ARM
+   if (vcpu->arch.vfp_lazy == 1) {
+   kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
+   vcpu->arch.vfp_lazy = 0;
+   }
+#endif
+}
 
 /**
  * kvm_arch_init_vm - initializes a VM data structure
@@ -295,6 +309,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   /* Check if Guest accessed VFP registers */
+   kvm_switch_fp_regs(vcpu);
+
/*
 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 * if the vcpu is no longer assigned to a cpu.  This is used for the
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..a47acc1 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -96,6 +96,24 @@ ENTRY(__kvm_flush_vm_context)
bx  lr
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+ * void __kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes a lazy
+ * fp/simd switch, saves the guest, restores host.
+ *
+ */
+ENTRY(__kvm_restore_host_vfp_state)
+   push{r3-r7}
+
+   add r7, r0, #VCPU_VFP_GUEST
+   store_vfp_state r7
+
+   add r7, r0, #VCPU_VFP_HOST
+   ldr r7, [r7]
+   restore_vfp_state r7
+
+   pop {r3-r7}
+   bx  lr
+ENDPROC(__kvm_restore_host_vfp_state)
 
 /
  *  Hypervisor world-switch code
@@ -131,7 +149,14 @@ ENTRY(__kvm_vcpu_run)
 
@ Trap coprocessor CRx accesses
set_hstr vmentry
+
+   ldr r1, [vcpu, #VCPU_VFP_LAZY]
+   cmp r1, #1
+   beq skip_guest_vfp_trap
+
set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+skip_guest_vfp_trap:
+
set_hdcr vmentry
 
@ Write configured ID register into MIDR alias
@@ -170,22 +195,12 @@ __kvm_vcpu_return:
@ Don't trap coprocessor accesses for host kernel
set_hstr vmexit
set_hdcr vmexit
-   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), 
after_vfp_restore
+   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 
 #ifdef CONFIG_VFPv3
-   @ Switch VFP/NEON hardware state to the host's
-   add r7, vcpu, #VCPU_VFP_GUEST
-   store_vfp_state r7
-   add r7, vcpu, #VCPU_VFP_HOST
-   ldr r7, [r7]
-   restore_vfp_state r7
-
-after_vfp_restore:
@ Restore FPEXC_EN which we clobbered on entry
pop {r2}
VFPFMXR FPEXC, r2
-#else
-after_vfp_restore:
 #endif
 
@ Reset Hyp-role
@@ -485,6 +500,9 @@ switch_to_guest_vfp:
@ NEON/VFP used.  Turn on VFP access.
set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
 
+   mov r1, #1
+   str r1, [vcpu, #VCPU_VFP_LAZY]
+
@ Switch VFP/NEON hardware state to the guest's
add r7, r0, #VCPU_VFP_HOST
ldr r7, [r7]
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM/arm: add fp/simd lazy switch support

2015-09-17 Thread Mario Smarduch
These patches enable armv7 fp/simd lazy switch. On guest entry fp/simd
access trap is set, and on guest first access fp/simd registers are switched -
host saved, guest restored. CPU continues with guest fp/simd content until
vcpu_put where guest is saved and host is restored. 

Running floating point workload illustrates reduction of fp/simd context 
switches the amount depends on the load. For a light load with with FP 
application running only 2% of all exits result in calls to lazy switch.

arm64 version is in test and appears to work fine, remaining work is
boot arm32 guest on arm64 and verify operation. Initial intent was to post
all patches at once, but arm64 version will be posted soon.

Mario Smarduch (2):
  add hooks for armv7 vfp/simd lazy switch support
  enable armv7 vfp/simd lazy switch

 arch/arm/include/asm/kvm_asm.h  |  1 +
 arch/arm/include/asm/kvm_host.h |  3 +++
 arch/arm/kernel/asm-offsets.c   |  1 +
 arch/arm/kvm/arm.c  | 17 +
 arch/arm/kvm/interrupts.S   | 40 +---
 5 files changed, 51 insertions(+), 11 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM/arm: add hooks for armv7 fp/simd lazy switch support

2015-09-17 Thread Mario Smarduch
Basic hooks are added to support fp/simd lazy switch. A vcpu flag to track 
fp/simd state, offset into the vcpu structure and switch prototype function.

Signed-off-by: Mario Smarduch <m.smard...@samsung.com>
---
 arch/arm/include/asm/kvm_asm.h  | 1 +
 arch/arm/include/asm/kvm_host.h | 3 +++
 arch/arm/kernel/asm-offsets.c   | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 194c91b..4b45d79 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..4858f6c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -111,6 +111,9 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
 
+   /* Track fp/simd lazy switch */
+   u32 vfp_lazy;
+
/* Exception Information */
struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..4a80802f 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -191,6 +191,7 @@ int main(void)
   DEFINE(VCPU_HPFAR,   offsetof(struct kvm_vcpu, arch.fault.hpfar));
   DEFINE(VCPU_HYP_PC,  offsetof(struct kvm_vcpu, arch.fault.hyp_pc));
   DEFINE(VCPU_VGIC_CPU,offsetof(struct kvm_vcpu, 
arch.vgic_cpu));
+  DEFINE(VCPU_VFP_LAZY,offsetof(struct kvm_vcpu, 
arch.vfp_lazy));
   DEFINE(VGIC_V2_CPU_HCR,  offsetof(struct vgic_cpu, vgic_v2.vgic_hcr));
   DEFINE(VGIC_V2_CPU_VMCR, offsetof(struct vgic_cpu, vgic_v2.vgic_vmcr));
   DEFINE(VGIC_V2_CPU_MISR, offsetof(struct vgic_cpu, vgic_v2.vgic_misr));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits

2015-08-19 Thread Mario Smarduch
Great that's even better.

On 8/19/2015 3:28 PM, Marc Zyngier wrote:
 On Wed, 19 Aug 2015 14:52:08 -0700
 Mario Smarduch m.smard...@samsung.com wrote:
 
 Hi Christoffer,
I'll test it and work with it.
 
 FWIW, I've added these patches to both -queue and -next, and from the
 tests Christoffer has run, it looks pretty good.
 
 Thanks,
 
   M.
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits

2015-08-19 Thread Mario Smarduch
Hi Christoffer,
   I'll test it and work with it.

Thanks,
  Mario

On 8/19/2015 10:49 AM, Christoffer Dall wrote:
 Hi Mario,
 
 On Wed, Aug 05, 2015 at 05:11:37PM +0100, Marc Zyngier wrote:
 On 16/07/15 22:29, Mario Smarduch wrote:
 This patch only saves and restores FP/SIMD registers on Guest access. To do
 this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit.
 lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD
 context is not saved/restored

 Signed-off-by: Mario Smarduch m.smard...@samsung.com

 So this patch seems to break 32bit guests on arm64.  I've had a look,
 squashed a few bugs that I dangerously overlooked during the review, but
 it still doesn't work (it doesn't crash anymore, but I get random
 illegal VFP instructions in 32bit guests).

 I'd be glad if someone could eyeball the following patch and tell me
 what's going wrong. If we don't find the root cause quickly enough, I'll
 have to drop the series from -next, and that'd be a real shame.

 Thanks,

  M.

 commit 5777dc55fbc170426a85e00c26002dd5a795cfa5
 Author: Marc Zyngier marc.zyng...@arm.com
 Date:   Wed Aug 5 16:53:01 2015 +0100

 KVM: arm64: NOTAFIX: Prevent crash when 32bit guest uses VFP

 Since we switch FPSIMD in a lazy way, access to FPEXC32_EL2
 must be guarded by skip_fpsimd_state. Otherwise, all hell
 break loose.

 Also, FPEXC32_EL2 must be restored when we trap to EL2 to
 enable floating point.

 Note that while it prevents the host from catching fire, the
 guest still doesn't work properly, and I don't understand why just
 yet.

 Not-really-signed-off-by: Marc Zyngier marc.zyng...@arm.com

 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index c8e0c70..b53ec5d 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -431,10 +431,12 @@
  add x3, x2, #CPU_SYSREG_OFFSET(DACR32_EL2)
  mrs x4, dacr32_el2
  mrs x5, ifsr32_el2
 -mrs x6, fpexc32_el2
  stp x4, x5, [x3]
 -str x6, [x3, #16]

 +skip_fpsimd_state x8, 3f
 +mrs x6, fpexc32_el2
 +str x6, [x3, #16]
 +3:
  skip_debug_state x8, 2f
  mrs x7, dbgvcr32_el2
  str x7, [x3, #24]
 @@ -461,10 +463,8 @@

  add x3, x2, #CPU_SYSREG_OFFSET(DACR32_EL2)
  ldp x4, x5, [x3]
 -ldr x6, [x3, #16]
  msr dacr32_el2, x4
  msr ifsr32_el2, x5
 -msr fpexc32_el2, x6

  skip_debug_state x8, 2f
  ldr x7, [x3, #24]
 @@ -669,12 +669,14 @@ __restore_debug:
  ret

  __save_fpsimd:
 +skip_fpsimd_state x3, 1f
  save_fpsimd
 -ret
 +1:  ret

  __restore_fpsimd:
 +skip_fpsimd_state x3, 1f
  restore_fpsimd
 -ret
 +1:  ret

  switch_to_guest_fpsimd:
  pushx4, lr
 @@ -682,6 +684,7 @@ switch_to_guest_fpsimd:
  mrs x2, cptr_el2
  bic x2, x2, #CPTR_EL2_TFP
  msr cptr_el2, x2
 +isb

  mrs x0, tpidr_el2

 @@ -692,6 +695,10 @@ switch_to_guest_fpsimd:
  add x2, x0, #VCPU_CONTEXT
  bl __restore_fpsimd

 +skip_32bit_state x3, 1f
 +ldr x4, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
 +msr fpexc32_el2, x4
 +1:
  pop x4, lr
  pop x2, x3
  pop x0, x1
 @@ -754,9 +761,7 @@ __kvm_vcpu_return:
  add x2, x0, #VCPU_CONTEXT

  save_guest_regs
 -skip_fpsimd_state x3, 1f
  bl __save_fpsimd
 -1:
  bl __save_sysregs

  skip_debug_state x3, 1f
 @@ -777,9 +782,7 @@ __kvm_vcpu_return:
  kern_hyp_va x2

  bl __restore_sysregs
 -skip_fpsimd_state x3, 1f
  bl __restore_fpsimd
 -1:
  /* Clear FPSIMD and Trace trapping */
  msr cptr_el2, xzr


 
 Marc and I have hunted down the issue at KVM Forum and we believe we've
 found the issue.  Please have a look at the following follow-up patch to
 Marc's patch above:
 
 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index 8b2a73b4..842e727 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -769,11 +769,26 @@
  
  .macro activate_traps
   ldr x2, [x0, #VCPU_HCR_EL2]
 +
 + /*
 +  * We are about to set CPTR_EL2.TFP to trap all floating point
 +  * register accesses to EL2, however, the ARM ARM clearly states that
 +  * traps are only taken to EL2 if the operation would not otherwise
 +  * trap to EL1.  Therefore, always make sure that for 32-bit guests,
 +  * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
 +  */
 + tbnzx2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
 + mov x3, #(1  30)
 + msr fpexc32_el2, x3
 + isb
 +99:
 +
   msr hcr_el2, x2
   mov x2, #CPTR_EL2_TTA
   orr x2, x2, #CPTR_EL2_TFP
   msr cptr_el2, x2
  
 +
   mov x2, #(1  15)  // Trap CP15 Cr=15
   msr hstr_el2, x2
  
 
 
 Thanks,
 -Christoffer
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord

Re: [PATCH v4 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits for non-VHE

2015-07-16 Thread Mario Smarduch
On 07/16/2015 12:05 PM, Christoffer Dall wrote:
 On Thu, Jul 16, 2015 at 11:23:08AM -0700, Mario Smarduch wrote:
 On 07/16/2015 08:52 AM, Christoffer Dall wrote:
 On Fri, Jul 10, 2015 at 06:19:05PM -0700, Mario Smarduch wrote:
 This is a followp to previous iteration but implemented on top of VHE 
 patches. 
 Only non-VHE path is addressied by this patch. In second patch 32-bit 
 handler 
 is updated to keep exit handling consistent with 64-bit code, and nothing
 has changed.

 Why not simply preserve this the way it was in v3 and have it merged
 first - after all we have reviewed it and I thought it was more or less
 ready to be merged - I suspect the VHE patches may have a way to go
 still ?

 Definitely, that's a better path. After looking at VHE patches,
 I would probably leave V3 the way it is (keeping deactivate_:
 symmetric). Marc has Reviewed V3 and you commented either way was
 fine with you, so V3 should be ok.
 
 Yes, but there was a comment in the assembly file to fix up IIRC.

That's right 1/2 and 0/2 header need little bit of editing.

 
 Can you do a quick respin with that commentary changed and then Marc
 can queue that if he agrees?
 
 Thanks,
 -Christoffer
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits for non-VHE

2015-07-16 Thread Mario Smarduch
On 07/16/2015 08:52 AM, Christoffer Dall wrote:
 On Fri, Jul 10, 2015 at 06:19:05PM -0700, Mario Smarduch wrote:
 This is a followp to previous iteration but implemented on top of VHE 
 patches. 
 Only non-VHE path is addressied by this patch. In second patch 32-bit 
 handler 
 is updated to keep exit handling consistent with 64-bit code, and nothing
 has changed.

 Why not simply preserve this the way it was in v3 and have it merged
 first - after all we have reviewed it and I thought it was more or less
 ready to be merged - I suspect the VHE patches may have a way to go
 still ?

Definitely, that's a better path. After looking at VHE patches,
I would probably leave V3 the way it is (keeping deactivate_:
symmetric). Marc has Reviewed V3 and you commented either way was
fine with you, so V3 should be ok.

Jumping on VHE is little too much at this time, thanks for the
alternative I kind of got myself into a jam here.

- Mario

 
 -Christoffer
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] arm: KVM: keep arm vfp/simd exit handling consistent with arm64

2015-07-16 Thread Mario Smarduch
After enhancing arm64 FP/SIMD exit handling, ARMv7 VFP exit branch is moved
to guest trap handling. This allows us to keep exit handling flow between both
architectures consistent.

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/kvm/interrupts.S | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 79caf79..b245b4e 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -363,10 +363,6 @@ hyp_hvc:
@ Check syndrome register
mrc p15, 4, r1, c5, c2, 0   @ HSR
lsr r0, r1, #HSR_EC_SHIFT
-#ifdef CONFIG_VFPv3
-   cmp r0, #HSR_EC_CP_0_13
-   beq switch_to_guest_vfp
-#endif
cmp r0, #HSR_EC_HVC
bne guest_trap  @ Not HVC instr.
 
@@ -380,7 +376,10 @@ hyp_hvc:
cmp r2, #0
bne guest_trap  @ Guest called HVC
 
-host_switch_to_hyp:
+   /*
+* Getting here means host called HVC, we shift parameters and branch
+* to Hyp function.
+*/
pop {r0, r1, r2}
 
/* Check for __hyp_get_vectors */
@@ -411,6 +410,10 @@ guest_trap:
 
@ Check if we need the fault information
lsr r1, r1, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+   cmp r1, #HSR_EC_CP_0_13
+   beq switch_to_guest_vfp
+#endif
cmp r1, #HSR_EC_IABT
mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
beq 2f
@@ -479,7 +482,6 @@ guest_trap:
  */
 #ifdef CONFIG_VFPv3
 switch_to_guest_vfp:
-   load_vcpu   @ Load VCPU pointer to r0
push{r3-r7}
 
@ NEON/VFP used.  Turn on VFP access.
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits

2015-07-16 Thread Mario Smarduch
Currently we save/restore fp/simd on each exit. The first patch optimizes arm64
save/restore, we only do so on Guest access. hackbench and several lmbench
tests show anywhere from 30% to 50% of exits don't context switch the vfp/simd
registers.

For second patch 32-bit handler is updated to keep exit handling consistent
with 64-bit code.

Changes since v3:
- Per Christoffers comment - changed comment for skip fp/simd in patch 1/2
- Changed cover text, clarify optimization in the context of this patch 

Changes since v2:
- Only for patch 2/2
  - Removed load_vcpu in switch_to_guest_vfp per Marcs comment
  - Got another chance to replace an unreferenced label with a comment

Changes since v1:
- only for patch 2/2
  - Reworked trapping to vfp access handler

Changes since inital version:
- Addressed Marcs comments
- Verified optimization improvements with lmbench and hackbench, updated
  commit message


Mario Smarduch (2):
  Optimize arm64 skip 30-50% vfp/simd save/restore on exits
  keep arm vfp/simd exit handling consistent with arm64

 arch/arm/kvm/interrupts.S| 14 +++--
 arch/arm64/include/asm/kvm_arm.h |  5 -
 arch/arm64/kvm/hyp.S | 45 +---
 3 files changed, 54 insertions(+), 10 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/2] arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits

2015-07-16 Thread Mario Smarduch
This patch only saves and restores FP/SIMD registers on Guest access. To do
this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit.
lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD
context is not saved/restored

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm64/include/asm/kvm_arm.h |  5 -
 arch/arm64/kvm/hyp.S | 45 +---
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index ac6fafb..7605e09 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -171,10 +171,13 @@
 #define HSTR_EL2_TTEE  (1  16)
 #define HSTR_EL2_T(x)  (1  x)
 
+/* Hyp Coproccessor Trap Register Shifts */
+#define CPTR_EL2_TFP_SHIFT 10
+
 /* Hyp Coprocessor Trap Register */
 #define CPTR_EL2_TCPAC (1  31)
 #define CPTR_EL2_TTA   (1  20)
-#define CPTR_EL2_TFP   (1  10)
+#define CPTR_EL2_TFP   (1  CPTR_EL2_TFP_SHIFT)
 
 /* Hyp Debug Configuration Register bits */
 #define MDCR_EL2_TDRA  (1  11)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 5befd01..e708d5a 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -673,6 +673,14 @@
tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
+/*
+ * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
+ */
+.macro skip_fpsimd_state tmp, target
+   mrs \tmp, cptr_el2
+   tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
+.endm
+
 .macro compute_debug_state target
// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
// is set, we do a full save/restore cycle and disable trapping.
@@ -763,6 +771,7 @@
ldr x2, [x0, #VCPU_HCR_EL2]
msr hcr_el2, x2
mov x2, #CPTR_EL2_TTA
+   orr x2, x2, #CPTR_EL2_TFP
msr cptr_el2, x2
 
mov x2, #(1  15)  // Trap CP15 Cr=15
@@ -785,7 +794,6 @@
 .macro deactivate_traps
mov x2, #HCR_RW
msr hcr_el2, x2
-   msr cptr_el2, xzr
msr hstr_el2, xzr
 
mrs x2, mdcr_el2
@@ -912,6 +920,28 @@ __restore_fpsimd:
restore_fpsimd
ret
 
+switch_to_guest_fpsimd:
+   pushx4, lr
+
+   mrs x2, cptr_el2
+   bic x2, x2, #CPTR_EL2_TFP
+   msr cptr_el2, x2
+
+   mrs x0, tpidr_el2
+
+   ldr x2, [x0, #VCPU_HOST_CONTEXT]
+   kern_hyp_va x2
+   bl __save_fpsimd
+
+   add x2, x0, #VCPU_CONTEXT
+   bl __restore_fpsimd
+
+   pop x4, lr
+   pop x2, x3
+   pop x0, x1
+
+   eret
+
 /*
  * u64 __kvm_vcpu_run(struct kvm_vcpu *vcpu);
  *
@@ -932,7 +962,6 @@ ENTRY(__kvm_vcpu_run)
kern_hyp_va x2
 
save_host_regs
-   bl __save_fpsimd
bl __save_sysregs
 
compute_debug_state 1f
@@ -948,7 +977,6 @@ ENTRY(__kvm_vcpu_run)
add x2, x0, #VCPU_CONTEXT
 
bl __restore_sysregs
-   bl __restore_fpsimd
 
skip_debug_state x3, 1f
bl  __restore_debug
@@ -967,7 +995,9 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
 
save_guest_regs
+   skip_fpsimd_state x3, 1f
bl __save_fpsimd
+1:
bl __save_sysregs
 
skip_debug_state x3, 1f
@@ -986,7 +1016,11 @@ __kvm_vcpu_return:
kern_hyp_va x2
 
bl __restore_sysregs
+   skip_fpsimd_state x3, 1f
bl __restore_fpsimd
+1:
+   /* Clear FPSIMD and Trace trapping */
+   msr cptr_el2, xzr
 
skip_debug_state x3, 1f
// Clear the dirty flag for the next run, as all the state has
@@ -1201,6 +1235,11 @@ el1_trap:
 * x1: ESR
 * x2: ESR_EC
 */
+
+   /* Guest accessed VFP/SIMD registers, save host, restore Guest */
+   cmp x2, #ESR_ELx_EC_FP_ASIMD
+   b.eqswitch_to_guest_fpsimd
+
cmp x2, #ESR_ELx_EC_DABT_LOW
mov x0, #ESR_ELx_EC_IABT_LOW
ccmpx2, x0, #4, ne
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/2] arm64: KVM: Optimize arm64 non-VHE fpsimd skip 30-50% save/restore on exits

2015-07-11 Thread Mario Smarduch
This patch only saves and restores FP/SIMD registers on Guest access. To do
this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit.
The non-VHE path has been tested, future work would add VHE support.

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm64/include/asm/kvm_arm.h |  5 +++-
 arch/arm64/kvm/hyp.S | 58 +---
 2 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index c8998c0..0a1d152 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -172,10 +172,13 @@
 #define HSTR_EL2_TTEE  (1  16)
 #define HSTR_EL2_T(x)  (1  x)
 
+/* Hyp Coproccessor Trap Register Shifts */
+#define CPTR_EL2_TFP_SHIFT 10
+
 /* Hyp Coprocessor Trap Register */
 #define CPTR_EL2_TCPAC (1  31)
 #define CPTR_EL2_TTA   (1  20)
-#define CPTR_EL2_TFP   (1  10)
+#define CPTR_EL2_TFP   (1  CPTR_EL2_TFP_SHIFT)
 
 /* Hyp Debug Configuration Register bits */
 #define MDCR_EL2_TDRA  (1  11)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 64a5280..9d154ed 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -731,6 +731,15 @@ ifnvhe mrs\tmp, hcr_el2, _S_(ldr 
\tmp, [x0, #VCPU_HCR_EL2])
tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
+/*
+ * For non-VHE - branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping
+ * enabled). For VHE do nothing.
+ */
+.macro skip_fpsimd_state tmp, target
+ifnvhe mrs \tmp, cptr_el2,   nop
+ifnvhe _S_(tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target),nop
+.endm
+
 .macro compute_debug_state target
// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
// is set, we do a full save/restore cycle and disable trapping.
@@ -823,7 +832,7 @@ ifnvhe mrs \tmp, hcr_el2, _S_(ldr \tmp, 
[x0, #VCPU_HCR_EL2])
adr x3, __kvm_hyp_vector
 ifnvhe nop,msrvbar_el1, x3
 ifnvhe nop,mrsx2, cpacr_el1
-ifnvhe _S_(ldr x2, =(CPTR_EL2_TTA)),   orr x2, x2, #(1  28)
+ifnvhe _S_(ldr x2, =(CPTR_EL2_TTA|CPTR_EL2_TFP)),  orr x2, x2, #(1  28)
 ifnvhe msrcptr_el2, x2,  msrcpacr_el1, x2
 
mov x2, #(1  15)  // Trap CP15 Cr=15
@@ -851,7 +860,7 @@ ifnvhe  nop,_S_(orr 
x2, x2, #HCR_E2H)
 ifnvhe nop,mrsx2, cpacr_el1
 ifnvhe nop,movn   x3, #(1  12), lsl #16
 ifnvhe nop,andx2, x2, x3
-ifnvhe msrcptr_el2, xzr, msrcpacr_el1, x2
+ifnvhe nop,msrcpacr_el1, x2
msr hstr_el2, xzr
 
mrs x2, mdcr_el2
@@ -988,6 +997,33 @@ __restore_fpsimd:
ret
 
 /*
+ * For non-VHE - on first FP/SIMD access, restore guest, save host registers
+ * and disable future trapping. For VHE this should never get called.
+ */
+switch_to_guest_fpsimd:
+   pushx4, lr
+
+   mrs x2, cptr_el2
+   bic x2, x2, #CPTR_EL2_TFP
+   msr cptr_el2, x2
+
+   mrs x0, tpidr_el2
+
+   ldr x2, [x0, #VCPU_HOST_CONTEXT]
+   kern_hyp_va x2
+   bl __save_fpsimd
+
+   add x2, x0, #VCPU_CONTEXT
+   bl __restore_fpsimd
+
+   pop x4, lr
+   pop x2, x3
+   pop x0, x1
+
+   eret
+
+
+/*
  * u64 __kvm_vcpu_run(struct kvm_vcpu *vcpu);
  *
  * This is the world switch. The first half of the function
@@ -1007,7 +1043,7 @@ ENTRY(__kvm_vcpu_run)
kern_hyp_va x2
 
save_host_regs
-   bl __save_fpsimd
+ifnvhe nop,bl __save_fpsimd
 ifnvhe bl __save_sysregs,nop
bl  __save_shared_sysregs
 
@@ -1025,7 +1061,7 @@ ifnvhe bl__save_sysregs,nop
 
bl __restore_sysregs
bl __restore_shared_sysregs
-   bl __restore_fpsimd
+ifnvhe nop,  bl __restore_fpsimd
 
skip_debug_state x3, 1f
bl  __restore_debug
@@ -1044,7 +1080,9 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
 
save_guest_regs
+   skip_fpsimd_state x3, 1f
bl __save_fpsimd
+1:
bl __save_sysregs
bl __save_shared_sysregs
 
@@ -1072,7 +1110,11 @@ __kvm_vcpu_return_irq:
 
 ifnvhe bl __restore_sysregs, nop
bl  __restore_shared_sysregs
+   skip_fpsimd_state x3, 1f
bl __restore_fpsimd
+1:
+   /* For non-VHE - Clear FPSIMD and Trace trapping, do nothig for VHE */
+ifnvhe msr cptr_el2, xzr,nop
 
skip_debug_state x3, 1f
// Clear the dirty flag for the next run, as all the state has
@@ -1298,6 +1340,14 @@ el1_trap:
 * x1: ESR
 * x2: ESR_EC

[PATCH v4 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits for non-VHE

2015-07-11 Thread Mario Smarduch
This is a followp to previous iteration but implemented on top of VHE patches. 
Only non-VHE path is addressied by this patch. In second patch 32-bit handler 
is updated to keep exit handling consistent with 64-bit code, and nothing
has changed.

Currently we save/restore fp/simd on each exit, the first  patch optimizes 
arm64 save/restore, we only do so on Guest access. hackbench and
several lmbench tests show anywhere from 30% to 50% of exits don't 
save/restore fp/simd register set.

Tested on Foundation Model, unfortuntely not tested yet on VHE enabled model.

Mario Smarduch (2):
  Optimize arm64 non-VHE fpsimd skip 30-50% save/restore on exits
  keep arm vfp/simd exit handling consistent with arm64

 arch/arm/kvm/interrupts.S| 14 +-
 arch/arm64/include/asm/kvm_arm.h |  5 +++-
 arch/arm64/kvm/hyp.S | 58 +---
 3 files changed, 66 insertions(+), 11 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] keep arm vfp/simd exit handling consistent with arm64

2015-07-11 Thread Mario Smarduch
After enhancing arm64 FP/SIMD exit handling, ARMv7 VFP exit branch is moved
to guest trap handling. This allows us to keep exit handling flow between both
architectures consistent.

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/kvm/interrupts.S | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 568494d..900ef6d 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -361,10 +361,6 @@ hyp_hvc:
@ Check syndrome register
mrc p15, 4, r1, c5, c2, 0   @ HSR
lsr r0, r1, #HSR_EC_SHIFT
-#ifdef CONFIG_VFPv3
-   cmp r0, #HSR_EC_CP_0_13
-   beq switch_to_guest_vfp
-#endif
cmp r0, #HSR_EC_HVC
bne guest_trap  @ Not HVC instr.
 
@@ -378,7 +374,10 @@ hyp_hvc:
cmp r2, #0
bne guest_trap  @ Guest called HVC
 
-host_switch_to_hyp:
+   /*
+* Getting here means host called HVC, we shift parameters and branch
+* to Hyp function.
+*/
pop {r0, r1, r2}
 
/* Check for __hyp_get_vectors */
@@ -409,6 +408,10 @@ guest_trap:
 
@ Check if we need the fault information
lsr r1, r1, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+   cmp r1, #HSR_EC_CP_0_13
+   beq switch_to_guest_vfp
+#endif
cmp r1, #HSR_EC_IABT
mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
beq 2f
@@ -477,7 +480,6 @@ guest_trap:
  */
 #ifdef CONFIG_VFPv3
 switch_to_guest_vfp:
-   load_vcpu   @ Load VCPU pointer to r0
push{r3-r7}
 
@ NEON/VFP used.  Turn on VFP access.
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/13] arm64: KVM: VHE: Add alternatives for VHE-enabled world-switch

2015-07-09 Thread Mario Smarduch
On 07/09/2015 01:06 AM, Marc Zyngier wrote:
 Hi Mario,
 
 On 09/07/15 02:29, Mario Smarduch wrote:
 On 07/08/2015 09:19 AM, Marc Zyngier wrote:
 In order to switch between host and guest, a VHE-enabled kernel
 must use different accessors for certain system registers.

 This patch uses runtime patching to use the right instruction
 when required...

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  arch/arm64/include/asm/kvm_asm.h |  40 ++--
  arch/arm64/kvm/hyp.S | 210 
 ++-
  arch/arm64/kvm/vhe-macros.h  |  18 
  3 files changed, 191 insertions(+), 77 deletions(-)

 []
   * Author: Marc Zyngier marc.zyng...@arm.com
   *
   * This program is free software; you can redistribute it and/or modify
 @@ -67,40 +67,52 @@
   stp x29, lr, [x3, #80]

   mrs x19, sp_el0
 - mrs x20, elr_el2// pc before entering el2
 - mrs x21, spsr_el2   // pstate before entering el2
 + str x19, [x3, #96]
 +.endm

 - stp x19, x20, [x3, #96]
 - str x21, [x3, #112]

 Hi Marc,

   trying to make a little sense out of this :)
 
 Don't even try, it hurts... ;-)
 
 In the case of VHE kernel the two 'mrs_hyp()' and 'mrs_el1()'
 calls would  be accessing same registers - namely EL1 variants?
 For non VHE EL2, EL1?

 The mrs_s and sysreg_EL12 are new, not sure what these mean.
 
 mrs_s and msr_s are just macros to that deal with system registers that
 the assembler doesn't know about (yet). They have been in (moderate) use
 for about a year, and have been introduced with the GICv3 support.
 
 See arch/arm64/include/asm/sysreg.h for the gory details.
 
 Now, on to sysreg_EL12: The main idea with VHE is that anything that
 used to run at EL1 (the kernel) can now run unmodified at EL2, and that
 it is the EL2 software that has to change to deal with it.
 
 So when the kernel uses VHE and runs at EL2, an access to sysreg_EL1
 really accesses sysreg_EL2, transparently. This is what makes it
 possible to run the kernel at EL2 without any change.
 
 But when the KVM world switch wants to access a guest register, it
 cannot use sysreg_EL1 anymore (that would hit on the EL2 register
 because of the above rule). For this, it must use sysreg_EL12 which
 effectively means access the EL1 register from EL2.
 
 As a consequence, we have the following rules:
 - non-VHE: msr_el1 uses EL1, msr_hyp uses EL2
 - VHE: msr_el1 uses EL12, msr_hyp uses EL1
 
 Does this help?

Yep it sure does, msr/mrs_hyp() and 12 naming had me confused.

Thanks!
 
   M.
 
 - Mario

 +.macro save_el1_state
 + mrs_hyp(x20, ELR)   // pc before entering el2
 + mrs_hyp(x21, SPSR)  // pstate before entering el2

   mrs x22, sp_el1
 - mrs x23, elr_el1
 - mrs x24, spsr_el1
 +
 + mrs_el1(x23, elr)
 + mrs_el1(x24, spsr)
 +
 + add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 + stp x20, x21, [x3, #8]  // HACK: Store to the regs after 
 SP_EL0

   str x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)]
   str x23, [x2, #CPU_GP_REG_OFFSET(CPU_ELR_EL1)]
   str x24, [x2, #CPU_SPSR_OFFSET(KVM_SPSR_EL1)]
  .endm

 -.macro restore_common_regs
 +.macro restore_el1_state
   // x2: base address for cpu context
   // x3: tmp register

 + add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 + ldp x20, x21, [x3, #8] // Same hack again, get guest PC and pstate
 +
   ldr x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)]
   ldr x23, [x2, #CPU_GP_REG_OFFSET(CPU_ELR_EL1)]
   ldr x24, [x2, #CPU_SPSR_OFFSET(KVM_SPSR_EL1)]

 + msr_hyp(ELR, x20)   // pc on return from el2
 + msr_hyp(SPSR, x21)  // pstate on return from el2
 +
   msr sp_el1, x22
 - msr elr_el1, x23
 - msr spsr_el1, x24

 - add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 - ldp x19, x20, [x3]
 - ldr x21, [x3, #16]
 + msr_el1(elr, x23)
 + msr_el1(spsr, x24)
 +.endm

 +.macro restore_common_regs
 + // x2: base address for cpu context
 + // x3: tmp register
 +
 + ldr x19, [x2, #CPU_XREG_OFFSET(31)] // SP_EL0
   msr sp_el0, x19
 - msr elr_el2, x20// pc on return from el2
 - msr spsr_el2, x21   // pstate on return from el2

   add x3, x2, #CPU_XREG_OFFSET(19)
   ldp x19, x20, [x3]
 @@ -113,9 +125,15 @@

  .macro save_host_regs
   save_common_regs
 +ifnvhe   nop,b  skip_el1_save
 + save_el1_state
 +skip_el1_save:
  .endm

  .macro restore_host_regs
 +ifnvhe   nop,b  
 skip_el1_restore
 + restore_el1_state
 +skip_el1_restore:
   restore_common_regs
  .endm

 @@ -159,6 +177,7 @@
   stp x6, x7, [x3, #16]

   save_common_regs
 + save_el1_state
  .endm

  .macro restore_guest_regs
 @@ -184,6 +203,7

Re: [PATCH 09/13] arm64: KVM: VHE: Add alternatives for VHE-enabled world-switch

2015-07-08 Thread Mario Smarduch
On 07/08/2015 09:19 AM, Marc Zyngier wrote:
 In order to switch between host and guest, a VHE-enabled kernel
 must use different accessors for certain system registers.
 
 This patch uses runtime patching to use the right instruction
 when required...
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  arch/arm64/include/asm/kvm_asm.h |  40 ++--
  arch/arm64/kvm/hyp.S | 210 
 ++-
  arch/arm64/kvm/vhe-macros.h  |  18 
  3 files changed, 191 insertions(+), 77 deletions(-)
 
[]
   * Author: Marc Zyngier marc.zyng...@arm.com
   *
   * This program is free software; you can redistribute it and/or modify
 @@ -67,40 +67,52 @@
   stp x29, lr, [x3, #80]
  
   mrs x19, sp_el0
 - mrs x20, elr_el2// pc before entering el2
 - mrs x21, spsr_el2   // pstate before entering el2
 + str x19, [x3, #96]
 +.endm
  
 - stp x19, x20, [x3, #96]
 - str x21, [x3, #112]

Hi Marc,

  trying to make a little sense out of this :)

In the case of VHE kernel the two 'mrs_hyp()' and 'mrs_el1()'
calls would  be accessing same registers - namely EL1 variants?
For non VHE EL2, EL1?

The mrs_s and sysreg_EL12 are new, not sure what these mean.

- Mario

 +.macro save_el1_state
 + mrs_hyp(x20, ELR)   // pc before entering el2
 + mrs_hyp(x21, SPSR)  // pstate before entering el2
  
   mrs x22, sp_el1
 - mrs x23, elr_el1
 - mrs x24, spsr_el1
 +
 + mrs_el1(x23, elr)
 + mrs_el1(x24, spsr)
 +
 + add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 + stp x20, x21, [x3, #8]  // HACK: Store to the regs after SP_EL0
  
   str x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)]
   str x23, [x2, #CPU_GP_REG_OFFSET(CPU_ELR_EL1)]
   str x24, [x2, #CPU_SPSR_OFFSET(KVM_SPSR_EL1)]
  .endm
  
 -.macro restore_common_regs
 +.macro restore_el1_state
   // x2: base address for cpu context
   // x3: tmp register
  
 + add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 + ldp x20, x21, [x3, #8] // Same hack again, get guest PC and pstate
 +
   ldr x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)]
   ldr x23, [x2, #CPU_GP_REG_OFFSET(CPU_ELR_EL1)]
   ldr x24, [x2, #CPU_SPSR_OFFSET(KVM_SPSR_EL1)]
  
 + msr_hyp(ELR, x20)   // pc on return from el2
 + msr_hyp(SPSR, x21)  // pstate on return from el2
 +
   msr sp_el1, x22
 - msr elr_el1, x23
 - msr spsr_el1, x24
  
 - add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 - ldp x19, x20, [x3]
 - ldr x21, [x3, #16]
 + msr_el1(elr, x23)
 + msr_el1(spsr, x24)
 +.endm
  
 +.macro restore_common_regs
 + // x2: base address for cpu context
 + // x3: tmp register
 +
 + ldr x19, [x2, #CPU_XREG_OFFSET(31)] // SP_EL0
   msr sp_el0, x19
 - msr elr_el2, x20// pc on return from el2
 - msr spsr_el2, x21   // pstate on return from el2
  
   add x3, x2, #CPU_XREG_OFFSET(19)
   ldp x19, x20, [x3]
 @@ -113,9 +125,15 @@
  
  .macro save_host_regs
   save_common_regs
 +ifnvhe   nop,b  skip_el1_save
 + save_el1_state
 +skip_el1_save:
  .endm
  
  .macro restore_host_regs
 +ifnvhe   nop,b  
 skip_el1_restore
 + restore_el1_state
 +skip_el1_restore:
   restore_common_regs
  .endm
  
 @@ -159,6 +177,7 @@
   stp x6, x7, [x3, #16]
  
   save_common_regs
 + save_el1_state
  .endm
  
  .macro restore_guest_regs
 @@ -184,6 +203,7 @@
   ldr x18, [x3, #144]
  
   // x19-x29, lr, sp*, elr*, spsr*
 + restore_el1_state
   restore_common_regs
  
   // Last bits of the 64bit state
 @@ -203,6 +223,38 @@
   * In other words, don't touch any of these unless you know what
   * you are doing.
   */
 +
 +.macro save_shared_sysregs
 + // x2: base address for cpu context
 + // x3: tmp register
 +
 + add x3, x2, #CPU_SYSREG_OFFSET(TPIDR_EL0)
 +
 + mrs x4, tpidr_el0
 + mrs x5, tpidrro_el0
 + mrs x6, tpidr_el1
 + mrs x7, actlr_el1
 +
 + stp x4, x5, [x3]
 + stp x6, x7, [x3, #16]
 +.endm
 +
 +.macro restore_shared_sysregs
 + // x2: base address for cpu context
 + // x3: tmp register
 +
 + add x3, x2, #CPU_SYSREG_OFFSET(TPIDR_EL0)
 +
 + ldp x4, x5, [x3]
 + ldp x6, x7, [x3, #16]
 +
 + msr tpidr_el0,  x4
 + msr tpidrro_el0,x5
 + msr tpidr_el1,  x6
 + msr actlr_el1,  x7
 +.endm
 +
 +
  .macro save_sysregs
   // x2: base address for cpu context
   // x3: tmp register
 @@ -211,26 +263,27 @@
  
   mrs x4, vmpidr_el2
   mrs x5, csselr_el1
 - mrs x6, sctlr_el1
 - mrs x7, actlr_el1
 - mrs x8, cpacr_el1
 -   

Re: [PATCH 2/3] arm: KVM: Implement lazy VFP switching outside of Hyp Mode

2015-07-06 Thread Mario Smarduch
On 07/05/2015 12:34 PM, Christoffer Dall wrote:
 On Wed, Jun 24, 2015 at 08:30:27PM -0700, Mario Smarduch wrote:
 This patch implements the VFP context switch code called from vcpu_put in
 Host KVM. In addition it implements the logic to skip setting a VFP trap if 
 one
 is not needed. Also resets the flag if Host KVM switched registers to trap 
 new
 guest vfp accesses.


 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm/kvm/interrupts.S |   49 
 -
  1 file changed, 31 insertions(+), 18 deletions(-)

 diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
 index 79caf79..0912edd 100644
 --- a/arch/arm/kvm/interrupts.S
 +++ b/arch/arm/kvm/interrupts.S
 @@ -96,6 +96,21 @@ ENTRY(__kvm_flush_vm_context)
  bx  lr
  ENDPROC(__kvm_flush_vm_context)
  
 +ENTRY(__kvm_restore_host_vfp_state)
 +push{r3-r7}
 +
 +mov r1, #0
 +str r1, [r0, #VCPU_VFP_SAVED]
 +
 +add r7, r0, #VCPU_VFP_GUEST
 +store_vfp_state r7
 +add r7, r0, #VCPU_VFP_HOST
 +ldr r7, [r7]
 +restore_vfp_state r7
 +
 +pop {r3-r7}
 +bx  lr
 +ENDPROC(__kvm_restore_host_vfp_state)
 
 it feels a bit strange to introduce this function here when I cannot see
 how it's called.
 
 At the very least, could you provide the C equivalent prototype in a
 comment or specify what the input registers are?  E.g.

Yes again that's on a todo list.
 
 /*
  * void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
  */
 
  
  /
   *  Hypervisor world-switch code
 @@ -131,7 +146,13 @@ ENTRY(__kvm_vcpu_run)
  
  @ Trap coprocessor CRx accesses
  set_hstr vmentry
 +
 +ldr r1, [vcpu, #VCPU_VFP_SAVED]
 +cmp r1, #1
 +beq skip_guest_vfp_trap
  set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 +skip_guest_vfp_trap:
 +
  set_hdcr vmentry
  
  @ Write configured ID register into MIDR alias
 @@ -173,18 +194,6 @@ __kvm_vcpu_return:
  set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
  
  #ifdef CONFIG_VFPv3
 -@ Save floating point registers we if let guest use them.
 -tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
 -bne after_vfp_restore
 -
 -@ Switch VFP/NEON hardware state to the host's
 -add r7, vcpu, #VCPU_VFP_GUEST
 -store_vfp_state r7
 -add r7, vcpu, #VCPU_VFP_HOST
 -ldr r7, [r7]
 -restore_vfp_state r7
 -
 -after_vfp_restore:
  @ Restore FPEXC_EN which we clobbered on entry
  pop {r2}
  VFPFMXR FPEXC, r2
 @@ -363,10 +372,6 @@ hyp_hvc:
  @ Check syndrome register
  mrc p15, 4, r1, c5, c2, 0   @ HSR
  lsr r0, r1, #HSR_EC_SHIFT
 -#ifdef CONFIG_VFPv3
 -cmp r0, #HSR_EC_CP_0_13
 -beq switch_to_guest_vfp
 -#endif
  cmp r0, #HSR_EC_HVC
  bne guest_trap  @ Not HVC instr.
  
 @@ -380,7 +385,10 @@ hyp_hvc:
  cmp r2, #0
  bne guest_trap  @ Guest called HVC
  
 -host_switch_to_hyp:
 +/*
 + * Getting here means host called HVC, we shift parameters and branch
 + * to Hyp function.
 + */
 
 not sure this comment change belongs in this patch (but the comment is
 well-written).

I built this patch on top of previous one. But IMO this series
is not ready for upstream yet.

 
  pop {r0, r1, r2}
  
  /* Check for __hyp_get_vectors */
 @@ -411,6 +419,10 @@ guest_trap:
  
  @ Check if we need the fault information
  lsr r1, r1, #HSR_EC_SHIFT
 +#ifdef CONFIG_VFPv3
 +cmp r1, #HSR_EC_CP_0_13
 +beq switch_to_guest_vfp
 +#endif
  cmp r1, #HSR_EC_IABT
  mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
  beq 2f
 @@ -479,11 +491,12 @@ guest_trap:
   */
  #ifdef CONFIG_VFPv3
  switch_to_guest_vfp:
 -load_vcpu   @ Load VCPU pointer to r0
  push{r3-r7}
  
  @ NEON/VFP used.  Turn on VFP access.
  set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
 +mov r1, #1
 +str r1, [vcpu, #VCPU_VFP_SAVED]
  
  @ Switch VFP/NEON hardware state to the guest's
  add r7, r0, #VCPU_VFP_HOST
 -- 
 1.7.9.5

 It would probably be easier to just rebase this on the previous series
 and refer to that in the cover letter, but the approach here looks
 otherwise right to me.

What if we used the simplified approach (as Marc mentioned) and
let it run for quite a while and then move this series?
 
 -Christoffer
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] arm: KVM: define headers and offsets to mange VFP state

2015-07-06 Thread Mario Smarduch
On 07/05/2015 12:27 PM, Christoffer Dall wrote:
 On Wed, Jun 24, 2015 at 08:30:26PM -0700, Mario Smarduch wrote:
 Define the required kvm_vcpu_arch fields, and offsets to manage VFP state. 
 And
 declary Hyp interface function to switch VFP registers.


 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm/include/asm/kvm_asm.h  |1 +
  arch/arm/include/asm/kvm_host.h |3 +++
  arch/arm/kernel/asm-offsets.c   |1 +
  3 files changed, 5 insertions(+)

 diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
 index 25410b2..08dda8c 100644
 --- a/arch/arm/include/asm/kvm_asm.h
 +++ b/arch/arm/include/asm/kvm_asm.h
 @@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
  extern void __kvm_flush_vm_context(void);
  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 +extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
  
  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
  #endif
 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index d71607c..22cea72 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -111,6 +111,9 @@ struct kvm_vcpu_arch {
  /* Interrupt related fields */
  u32 irq_lines;  /* IRQ and FIQ levels */
  
 +/* Track if VFP registers are occupied by Guest while in KVM host mode*/
 
 why capitalize guest?
I guess just highlight we're holding guest state outside of hyp mode
no issues can change.

 
 missing space at the end of the line.
 
 I don't really understand what the semantics of this field is by just
 lookint at this patch.  I would probably define a u32 flags field in
 stead and define a patch akin to what we do for the debug registers and
 call it 'vfp_dirty', and in a comment for the flag offset define say
 something like:
 
 The vfp_dirty flag must be set if the host VFP register state is
 modified.

Yes agree, this was put together quickly to get comments.
 
 +u32 vfp_guest_saved;
 +
  /* Exception Information */
  struct kvm_vcpu_fault_info fault;
  
 diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
 index 871b826..35093d0 100644
 --- a/arch/arm/kernel/asm-offsets.c
 +++ b/arch/arm/kernel/asm-offsets.c
 @@ -191,6 +191,7 @@ int main(void)
DEFINE(VCPU_HPFAR,offsetof(struct kvm_vcpu, 
 arch.fault.hpfar));
DEFINE(VCPU_HYP_PC,   offsetof(struct kvm_vcpu, 
 arch.fault.hyp_pc));
DEFINE(VCPU_VGIC_CPU, offsetof(struct kvm_vcpu, 
 arch.vgic_cpu));
 +  DEFINE(VCPU_VFP_SAVED,offsetof(struct kvm_vcpu, 
 arch.vfp_guest_saved));
DEFINE(VGIC_V2_CPU_HCR,   offsetof(struct vgic_cpu, vgic_v2.vgic_hcr));
DEFINE(VGIC_V2_CPU_VMCR,  offsetof(struct vgic_cpu, vgic_v2.vgic_vmcr));
DEFINE(VGIC_V2_CPU_MISR,  offsetof(struct vgic_cpu, vgic_v2.vgic_misr));
 -- 
 1.7.9.5


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm: KVM: Add VFP lazy switch hooks in Host KVM

2015-07-06 Thread Mario Smarduch
On 07/05/2015 12:37 PM, Christoffer Dall wrote:
 On Wed, Jun 24, 2015 at 08:30:28PM -0700, Mario Smarduch wrote:
 This patch implements host KVM interface to Hyp mode VFP function to 
 switch out guest and switch in host.

 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm/kvm/arm.c |   15 +++
  1 file changed, 15 insertions(+)

 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index d9631ec..77b41f5 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -105,6 +105,17 @@ void kvm_arch_check_processor_compat(void *rtn)
  *(int *)rtn = 0;
  }
  
 +/**
 + * kvm_switch_vp_regs() - switch guest/host VFP registers
 + * @vcpu:   pointer to vcpu structure.
 + *
 + * HYP interface functions to save guest and restore host VFP registers
 
 Not sure I understand what you mean to say with this line, how about:
 
 Calls an assembly routine in HYP mode to actually perform the state
 save/restore.
 
 However, why do we actually need to do this in hyp mode?  Can't we just
 as well do this in SVC mode or are we changing some trap settings here?

Yes it should be since non hyp registers are accessed.
I reuse it since all the code was there to do the switch.

 
 + */
 +static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
 
 should probalby be called kvm_vcpu_put_fp_regs
 
 +{
 +if (vcpu-arch.vfp_guest_saved == 1)
 +kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
 +}
  
  /**
   * kvm_arch_init_vm - initializes a VM data structure
 @@ -292,6 +303,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
  
  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
  {
 +
 +/* Check if Guest accessed VFP registers */
 +kvm_switch_fp_regs(vcpu);
 +
  /*
   * The arch-generic KVM code expects the cpu field of a vcpu to be -1
   * if the vcpu is no longer assigned to a cpu.  This is used for the
 -- 
 1.7.9.5
 
 How are we sure that the kernel never touches VFP registers between VCPU
 exit and kvm_arch_vcpu_put?  Can a kernel-side memcpy implementation use
 the FP regs or something like that?

Exceptions, interrupts - don't save any VFP context, if
these VFP registers are touched by the kernel they should
be saved/restored. The x86 version appears to the same.

 
 Thanks,
 -Christoffer
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] arm: KVM: VFP lazy switch in KVM Host Mode may save upto 98%

2015-07-06 Thread Mario Smarduch
On 07/05/2015 12:37 PM, Christoffer Dall wrote:
 Hi Mario,
 
 On Wed, Jun 24, 2015 at 08:30:25PM -0700, Mario Smarduch wrote:
 Currently we do a lazy VFP switch in Hyp mode, but once we exit and re-enter 
 hyp
 mode we trap again on VFP access. This mode has shown around 30-50% 
 improvement
 running hackbench and lmbench.

 This patch series extends lazy VFP switch beyond Hyp mode to KVM host mode.

 1 - On guest access we switch from host to guest and set a flag accessible 
 to 
 host
 2 - On exit to KVM host, VFP state is restored on vcpu_put if flag is marked 
 (1)
 3 - Otherwise guest is resumed and continues to use its VFP registers. 
 4 - In case of 2 on VM entry we set VFP trap flag to repeat 1.

 If guest does not access VFP registers them implemenation remains the same.

 Executing hackbench on Fast Models and Exynos arm32 board shows good
 results. Considering all exits 2% of the time KVM host lazy vfp switch is 
 invoked.

 Howeverr this patch set requires more burn in time and testing under various 
 loads.

 Currently ARM32 is addressed later ARM64.

 I think Marc said that he experimented with a similar patch once, but
 that it caused corruption on the host side somehow.
 
 Am I remembering correctly?  If so, we would need to make sure this
 doesn't happen with this patch set...

I think upstreaming the basic approach first (arm64, arm32 cleanups),
and let this series get some good runtime - would be better
and safer approach.


 
 Otherwise I think this sounds like a fairly good idea and I wonder if
 the same could be done on arm64?

Yes that's the intent, doing both architectures at once would be preferable.

Thanks,
  Mario
 
 Thanks,
 -Christoffer
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/2] arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits

2015-07-03 Thread Mario Smarduch
On 07/03/2015 04:53 AM, Christoffer Dall wrote:
 On Thu, Jul 02, 2015 at 02:51:57PM -0700, Mario Smarduch wrote:
 On 07/01/2015 06:46 AM, Christoffer Dall wrote:
 On Wed, Jun 24, 2015 at 05:04:11PM -0700, Mario Smarduch wrote:
 This patch only saves and restores FP/SIMD registers on Guest access. To do
 this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit.
 lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD
 context is not saved/restored

 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm64/include/asm/kvm_arm.h |5 -
  arch/arm64/kvm/hyp.S |   46 
 +++---
  2 files changed, 47 insertions(+), 4 deletions(-)

 diff --git a/arch/arm64/include/asm/kvm_arm.h 
 b/arch/arm64/include/asm/kvm_arm.h
 index ac6fafb..7605e09 100644
 --- a/arch/arm64/include/asm/kvm_arm.h
 +++ b/arch/arm64/include/asm/kvm_arm.h
 @@ -171,10 +171,13 @@
  #define HSTR_EL2_TTEE (1  16)
  #define HSTR_EL2_T(x) (1  x)
  
 +/* Hyp Coproccessor Trap Register Shifts */
 +#define CPTR_EL2_TFP_SHIFT 10
 +
  /* Hyp Coprocessor Trap Register */
  #define CPTR_EL2_TCPAC(1  31)
  #define CPTR_EL2_TTA  (1  20)
 -#define CPTR_EL2_TFP  (1  10)
 +#define CPTR_EL2_TFP  (1  CPTR_EL2_TFP_SHIFT)
  
  /* Hyp Debug Configuration Register bits */
  #define MDCR_EL2_TDRA (1  11)
 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index 5befd01..de0788f 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -673,6 +673,15 @@
tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
  .endm
  
 +/*
 + * Check cptr VFP/SIMD accessed bit, if set VFP/SIMD not accessed by 
 guest.

 This comment doesn't really help me understand the function, may I
 suggest:

 Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)

 Yes actually describes what it does.


 + */
 +.macro skip_fpsimd_state tmp, target
 +  mrs \tmp, cptr_el2
 +  tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
 +.endm
 +
 +
  .macro compute_debug_state target
// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
// is set, we do a full save/restore cycle and disable trapping.
 @@ -763,6 +772,7 @@
ldr x2, [x0, #VCPU_HCR_EL2]
msr hcr_el2, x2
mov x2, #CPTR_EL2_TTA
 +  orr x2, x2, #CPTR_EL2_TFP
msr cptr_el2, x2
  
mov x2, #(1  15)  // Trap CP15 Cr=15
 @@ -785,7 +795,6 @@
  .macro deactivate_traps
mov x2, #HCR_RW
msr hcr_el2, x2
 -  msr cptr_el2, xzr
msr hstr_el2, xzr
  
mrs x2, mdcr_el2
 @@ -912,6 +921,28 @@ __restore_fpsimd:
restore_fpsimd
ret
  
 +switch_to_guest_fpsimd:
 +  pushx4, lr
 +
 +  mrs x2, cptr_el2
 +  bic x2, x2, #CPTR_EL2_TFP
 +  msr cptr_el2, x2
 +
 +  mrs x0, tpidr_el2
 +
 +  ldr x2, [x0, #VCPU_HOST_CONTEXT]
 +  kern_hyp_va x2
 +  bl __save_fpsimd
 +
 +  add x2, x0, #VCPU_CONTEXT
 +  bl __restore_fpsimd
 +
 +  pop x4, lr
 +  pop x2, x3
 +  pop x0, x1
 +
 +  eret
 +
  /*
   * u64 __kvm_vcpu_run(struct kvm_vcpu *vcpu);
   *
 @@ -932,7 +963,6 @@ ENTRY(__kvm_vcpu_run)
kern_hyp_va x2
  
save_host_regs
 -  bl __save_fpsimd
bl __save_sysregs
  
compute_debug_state 1f
 @@ -948,7 +978,6 @@ ENTRY(__kvm_vcpu_run)
add x2, x0, #VCPU_CONTEXT
  
bl __restore_sysregs
 -  bl __restore_fpsimd
  
skip_debug_state x3, 1f
bl  __restore_debug
 @@ -967,7 +996,9 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
  
save_guest_regs
 +  skip_fpsimd_state x3, 1f
bl __save_fpsimd
 +1:
bl __save_sysregs
  
skip_debug_state x3, 1f
 @@ -986,7 +1017,11 @@ __kvm_vcpu_return:
kern_hyp_va x2
  
bl __restore_sysregs
 +  skip_fpsimd_state x3, 1f
bl __restore_fpsimd
 +1:
 +  /* Clear FPSIMD and Trace trapping */
 +  msr cptr_el2, xzr

 why not simply move the deactivate_traps down here instead?

 Putting deactivate_traps there  trashes x2, setup earlier
 to restore debug, host registers from host context

 Do we want deactivate_traps to use another register and
 move the macro there? Or leave as is?

 There was some clean symmetry in the code by using deactivate traps, but
 given this, I don't care strongly which way we end up doing it.

I agree. Also it probably makes more sense to keep the trap disable
code together to match the trap enables, without objection from
Marc I'll rework the patch, repost it after Monday.

Thanks,
- Mario
 
 Thanks,
 -Christoffer
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits

2015-07-02 Thread Mario Smarduch
On 07/01/2015 02:49 AM, Christoffer Dall wrote:
 On Wed, Jun 24, 2015 at 05:04:10PM -0700, Mario Smarduch wrote:
 Currently we save/restore fp/simd on each exit. Fist  patch optimizes arm64
 save/restore, we only do so on Guest access. hackbench and
 several lmbench tests show anywhere from 30% to above 50% optimzation
 achieved.

 In second patch 32-bit handler is updated to keep exit handling consistent
 with 64-bit code.
 
 30-50% of what?  The overhead or overall performance?

Yes, so considering all exits to Host KVM anywhere from 30 to 50%
didn't require an fp/simd switch.

Anything else you like to see added here?
 

 Changes since v1:
 - Addressed Marcs comments
 - Verified optimization improvements with lmbench and hackbench, updated 
   commit message

 Changes since v2:
 - only for patch 2/2
   - Reworked trapping to vfp access handler

 Changes since v3:
 - Only for patch 2/2
   - Removed load_vcpu in switch_to_guest_vfp per Marcs comment
   - Got another chance to replace an unreferenced label with a comment


 Mario Smarduch (2):
   Optimize arm64 skip 30-50% vfp/simd save/restore on exits
   keep arm vfp/simd exit handling consistent with arm64

  arch/arm/kvm/interrupts.S|   14 +++-
  arch/arm64/include/asm/kvm_arm.h |5 -
  arch/arm64/kvm/hyp.S |   46 
 +++---
  3 files changed, 55 insertions(+), 10 deletions(-)

 -- 
 1.7.9.5


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/2] arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits

2015-07-02 Thread Mario Smarduch
On 07/01/2015 06:46 AM, Christoffer Dall wrote:
 On Wed, Jun 24, 2015 at 05:04:11PM -0700, Mario Smarduch wrote:
 This patch only saves and restores FP/SIMD registers on Guest access. To do
 this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit.
 lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD
 context is not saved/restored

 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm64/include/asm/kvm_arm.h |5 -
  arch/arm64/kvm/hyp.S |   46 
 +++---
  2 files changed, 47 insertions(+), 4 deletions(-)

 diff --git a/arch/arm64/include/asm/kvm_arm.h 
 b/arch/arm64/include/asm/kvm_arm.h
 index ac6fafb..7605e09 100644
 --- a/arch/arm64/include/asm/kvm_arm.h
 +++ b/arch/arm64/include/asm/kvm_arm.h
 @@ -171,10 +171,13 @@
  #define HSTR_EL2_TTEE   (1  16)
  #define HSTR_EL2_T(x)   (1  x)
  
 +/* Hyp Coproccessor Trap Register Shifts */
 +#define CPTR_EL2_TFP_SHIFT 10
 +
  /* Hyp Coprocessor Trap Register */
  #define CPTR_EL2_TCPAC  (1  31)
  #define CPTR_EL2_TTA(1  20)
 -#define CPTR_EL2_TFP(1  10)
 +#define CPTR_EL2_TFP(1  CPTR_EL2_TFP_SHIFT)
  
  /* Hyp Debug Configuration Register bits */
  #define MDCR_EL2_TDRA   (1  11)
 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index 5befd01..de0788f 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -673,6 +673,15 @@
  tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
  .endm
  
 +/*
 + * Check cptr VFP/SIMD accessed bit, if set VFP/SIMD not accessed by guest.
 
 This comment doesn't really help me understand the function, may I
 suggest:
 
 Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)

Yes actually describes what it does.

 
 + */
 +.macro skip_fpsimd_state tmp, target
 +mrs \tmp, cptr_el2
 +tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
 +.endm
 +
 +
  .macro compute_debug_state target
  // Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
  // is set, we do a full save/restore cycle and disable trapping.
 @@ -763,6 +772,7 @@
  ldr x2, [x0, #VCPU_HCR_EL2]
  msr hcr_el2, x2
  mov x2, #CPTR_EL2_TTA
 +orr x2, x2, #CPTR_EL2_TFP
  msr cptr_el2, x2
  
  mov x2, #(1  15)  // Trap CP15 Cr=15
 @@ -785,7 +795,6 @@
  .macro deactivate_traps
  mov x2, #HCR_RW
  msr hcr_el2, x2
 -msr cptr_el2, xzr
  msr hstr_el2, xzr
  
  mrs x2, mdcr_el2
 @@ -912,6 +921,28 @@ __restore_fpsimd:
  restore_fpsimd
  ret
  
 +switch_to_guest_fpsimd:
 +pushx4, lr
 +
 +mrs x2, cptr_el2
 +bic x2, x2, #CPTR_EL2_TFP
 +msr cptr_el2, x2
 +
 +mrs x0, tpidr_el2
 +
 +ldr x2, [x0, #VCPU_HOST_CONTEXT]
 +kern_hyp_va x2
 +bl __save_fpsimd
 +
 +add x2, x0, #VCPU_CONTEXT
 +bl __restore_fpsimd
 +
 +pop x4, lr
 +pop x2, x3
 +pop x0, x1
 +
 +eret
 +
  /*
   * u64 __kvm_vcpu_run(struct kvm_vcpu *vcpu);
   *
 @@ -932,7 +963,6 @@ ENTRY(__kvm_vcpu_run)
  kern_hyp_va x2
  
  save_host_regs
 -bl __save_fpsimd
  bl __save_sysregs
  
  compute_debug_state 1f
 @@ -948,7 +978,6 @@ ENTRY(__kvm_vcpu_run)
  add x2, x0, #VCPU_CONTEXT
  
  bl __restore_sysregs
 -bl __restore_fpsimd
  
  skip_debug_state x3, 1f
  bl  __restore_debug
 @@ -967,7 +996,9 @@ __kvm_vcpu_return:
  add x2, x0, #VCPU_CONTEXT
  
  save_guest_regs
 +skip_fpsimd_state x3, 1f
  bl __save_fpsimd
 +1:
  bl __save_sysregs
  
  skip_debug_state x3, 1f
 @@ -986,7 +1017,11 @@ __kvm_vcpu_return:
  kern_hyp_va x2
  
  bl __restore_sysregs
 +skip_fpsimd_state x3, 1f
  bl __restore_fpsimd
 +1:
 +/* Clear FPSIMD and Trace trapping */
 +msr cptr_el2, xzr
 
 why not simply move the deactivate_traps down here instead?

Putting deactivate_traps there  trashes x2, setup earlier
to restore debug, host registers from host context

Do we want deactivate_traps to use another register and
move the macro there? Or leave as is?

- Mario


 
  
  skip_debug_state x3, 1f
  // Clear the dirty flag for the next run, as all the state has
 @@ -1201,6 +1236,11 @@ el1_trap:
   * x1: ESR
   * x2: ESR_EC
   */
 +
 +/* Guest accessed VFP/SIMD registers, save host, restore Guest */
 +cmp x2, #ESR_ELx_EC_FP_ASIMD
 +b.eqswitch_to_guest_fpsimd
 +
  cmp x2, #ESR_ELx_EC_DABT_LOW
  mov x0, #ESR_ELx_EC_IABT_LOW
  ccmpx2, x0, #4, ne
 -- 
 1.7.9.5

 
 Otherwise looks good,
 -Christoffer
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] arm: KVM: VFP lazy switch in KVM Host Mode may save upto 98%

2015-06-28 Thread Mario Smarduch
Hi Marc, Christoffer -

to clarify - this series may be causing a conflict with the arm64
basic approach, and arm32 exit code touch ups.

The intent for this series  is more of an RFC or preview, to get some
feedback - if this approach is sensible (I guess later applied to
arm64 as well if it is).

Thanks,
- Mario


On 06/24/2015 08:30 PM, Mario Smarduch wrote:
 Currently we do a lazy VFP switch in Hyp mode, but once we exit and re-enter 
 hyp
 mode we trap again on VFP access. This mode has shown around 30-50% 
 improvement
 running hackbench and lmbench.
 
 This patch series extends lazy VFP switch beyond Hyp mode to KVM host mode.
 
 1 - On guest access we switch from host to guest and set a flag accessible to 
 host
 2 - On exit to KVM host, VFP state is restored on vcpu_put if flag is marked 
 (1)
 3 - Otherwise guest is resumed and continues to use its VFP registers. 
 4 - In case of 2 on VM entry we set VFP trap flag to repeat 1.
 
 If guest does not access VFP registers them implemenation remains the same.
 
 Executing hackbench on Fast Models and Exynos arm32 board shows good
 results. Considering all exits 2% of the time KVM host lazy vfp switch is 
 invoked.
 
 Howeverr this patch set requires more burn in time and testing under various 
 loads.
 
 Currently ARM32 is addressed later ARM64.
 
 Mario Smarduch (3):
   define headers and offsets to mange VFP state
   Implement lazy VFP switching outside of Hyp Mode
   Add VFP lazy switch hooks in Host KVM
 
  arch/arm/include/asm/kvm_asm.h  |1 +
  arch/arm/include/asm/kvm_host.h |3 +++
  arch/arm/kernel/asm-offsets.c   |1 +
  arch/arm/kvm/arm.c  |   15 
  arch/arm/kvm/interrupts.S   |   49 
 +--
  5 files changed, 51 insertions(+), 18 deletions(-)
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits

2015-06-24 Thread Mario Smarduch
Currently we save/restore fp/simd on each exit. Fist  patch optimizes arm64
save/restore, we only do so on Guest access. hackbench and
several lmbench tests show anywhere from 30% to above 50% optimzation
achieved.

In second patch 32-bit handler is updated to keep exit handling consistent
with 64-bit code.

Changes since v1:
- Addressed Marcs comments
- Verified optimization improvements with lmbench and hackbench, updated 
  commit message

Changes since v2:
- only for patch 2/2
  - Reworked trapping to vfp access handler

Changes since v3:
- Only for patch 2/2
  - Removed load_vcpu in switch_to_guest_vfp per Marcs comment
  - Got another chance to replace an unreferenced label with a comment


Mario Smarduch (2):
  Optimize arm64 skip 30-50% vfp/simd save/restore on exits
  keep arm vfp/simd exit handling consistent with arm64

 arch/arm/kvm/interrupts.S|   14 +++-
 arch/arm64/include/asm/kvm_arm.h |5 -
 arch/arm64/kvm/hyp.S |   46 +++---
 3 files changed, 55 insertions(+), 10 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] arm: KVM: keep arm vfp/simd exit handling consistent with arm64

2015-06-24 Thread Mario Smarduch
After enhancing arm64 FP/SIMD exit handling, ARMv7 VFP exit branch is moved
to guest trap handling. This allows us to keep exit handling flow between both
architectures consistent.

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/kvm/interrupts.S |   14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 79caf79..b245b4e 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -363,10 +363,6 @@ hyp_hvc:
@ Check syndrome register
mrc p15, 4, r1, c5, c2, 0   @ HSR
lsr r0, r1, #HSR_EC_SHIFT
-#ifdef CONFIG_VFPv3
-   cmp r0, #HSR_EC_CP_0_13
-   beq switch_to_guest_vfp
-#endif
cmp r0, #HSR_EC_HVC
bne guest_trap  @ Not HVC instr.
 
@@ -380,7 +376,10 @@ hyp_hvc:
cmp r2, #0
bne guest_trap  @ Guest called HVC
 
-host_switch_to_hyp:
+   /*
+* Getting here means host called HVC, we shift parameters and branch
+* to Hyp function.
+*/
pop {r0, r1, r2}
 
/* Check for __hyp_get_vectors */
@@ -411,6 +410,10 @@ guest_trap:
 
@ Check if we need the fault information
lsr r1, r1, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+   cmp r1, #HSR_EC_CP_0_13
+   beq switch_to_guest_vfp
+#endif
cmp r1, #HSR_EC_IABT
mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
beq 2f
@@ -479,7 +482,6 @@ guest_trap:
  */
 #ifdef CONFIG_VFPv3
 switch_to_guest_vfp:
-   load_vcpu   @ Load VCPU pointer to r0
push{r3-r7}
 
@ NEON/VFP used.  Turn on VFP access.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits

2015-06-24 Thread Mario Smarduch
This patch only saves and restores FP/SIMD registers on Guest access. To do
this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit.
lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD
context is not saved/restored

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm64/include/asm/kvm_arm.h |5 -
 arch/arm64/kvm/hyp.S |   46 +++---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index ac6fafb..7605e09 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -171,10 +171,13 @@
 #define HSTR_EL2_TTEE  (1  16)
 #define HSTR_EL2_T(x)  (1  x)
 
+/* Hyp Coproccessor Trap Register Shifts */
+#define CPTR_EL2_TFP_SHIFT 10
+
 /* Hyp Coprocessor Trap Register */
 #define CPTR_EL2_TCPAC (1  31)
 #define CPTR_EL2_TTA   (1  20)
-#define CPTR_EL2_TFP   (1  10)
+#define CPTR_EL2_TFP   (1  CPTR_EL2_TFP_SHIFT)
 
 /* Hyp Debug Configuration Register bits */
 #define MDCR_EL2_TDRA  (1  11)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 5befd01..de0788f 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -673,6 +673,15 @@
tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
+/*
+ * Check cptr VFP/SIMD accessed bit, if set VFP/SIMD not accessed by guest.
+ */
+.macro skip_fpsimd_state tmp, target
+   mrs \tmp, cptr_el2
+   tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
+.endm
+
+
 .macro compute_debug_state target
// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
// is set, we do a full save/restore cycle and disable trapping.
@@ -763,6 +772,7 @@
ldr x2, [x0, #VCPU_HCR_EL2]
msr hcr_el2, x2
mov x2, #CPTR_EL2_TTA
+   orr x2, x2, #CPTR_EL2_TFP
msr cptr_el2, x2
 
mov x2, #(1  15)  // Trap CP15 Cr=15
@@ -785,7 +795,6 @@
 .macro deactivate_traps
mov x2, #HCR_RW
msr hcr_el2, x2
-   msr cptr_el2, xzr
msr hstr_el2, xzr
 
mrs x2, mdcr_el2
@@ -912,6 +921,28 @@ __restore_fpsimd:
restore_fpsimd
ret
 
+switch_to_guest_fpsimd:
+   pushx4, lr
+
+   mrs x2, cptr_el2
+   bic x2, x2, #CPTR_EL2_TFP
+   msr cptr_el2, x2
+
+   mrs x0, tpidr_el2
+
+   ldr x2, [x0, #VCPU_HOST_CONTEXT]
+   kern_hyp_va x2
+   bl __save_fpsimd
+
+   add x2, x0, #VCPU_CONTEXT
+   bl __restore_fpsimd
+
+   pop x4, lr
+   pop x2, x3
+   pop x0, x1
+
+   eret
+
 /*
  * u64 __kvm_vcpu_run(struct kvm_vcpu *vcpu);
  *
@@ -932,7 +963,6 @@ ENTRY(__kvm_vcpu_run)
kern_hyp_va x2
 
save_host_regs
-   bl __save_fpsimd
bl __save_sysregs
 
compute_debug_state 1f
@@ -948,7 +978,6 @@ ENTRY(__kvm_vcpu_run)
add x2, x0, #VCPU_CONTEXT
 
bl __restore_sysregs
-   bl __restore_fpsimd
 
skip_debug_state x3, 1f
bl  __restore_debug
@@ -967,7 +996,9 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
 
save_guest_regs
+   skip_fpsimd_state x3, 1f
bl __save_fpsimd
+1:
bl __save_sysregs
 
skip_debug_state x3, 1f
@@ -986,7 +1017,11 @@ __kvm_vcpu_return:
kern_hyp_va x2
 
bl __restore_sysregs
+   skip_fpsimd_state x3, 1f
bl __restore_fpsimd
+1:
+   /* Clear FPSIMD and Trace trapping */
+   msr cptr_el2, xzr
 
skip_debug_state x3, 1f
// Clear the dirty flag for the next run, as all the state has
@@ -1201,6 +1236,11 @@ el1_trap:
 * x1: ESR
 * x2: ESR_EC
 */
+
+   /* Guest accessed VFP/SIMD registers, save host, restore Guest */
+   cmp x2, #ESR_ELx_EC_FP_ASIMD
+   b.eqswitch_to_guest_fpsimd
+
cmp x2, #ESR_ELx_EC_DABT_LOW
mov x0, #ESR_ELx_EC_IABT_LOW
ccmpx2, x0, #4, ne
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] arm: KVM: Implement lazy VFP switching outside of Hyp Mode

2015-06-24 Thread Mario Smarduch
This patch implements the VFP context switch code called from vcpu_put in
Host KVM. In addition it implements the logic to skip setting a VFP trap if one
is not needed. Also resets the flag if Host KVM switched registers to trap new
guest vfp accesses.


Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/kvm/interrupts.S |   49 -
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 79caf79..0912edd 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -96,6 +96,21 @@ ENTRY(__kvm_flush_vm_context)
bx  lr
 ENDPROC(__kvm_flush_vm_context)
 
+ENTRY(__kvm_restore_host_vfp_state)
+   push{r3-r7}
+
+   mov r1, #0
+   str r1, [r0, #VCPU_VFP_SAVED]
+
+   add r7, r0, #VCPU_VFP_GUEST
+   store_vfp_state r7
+   add r7, r0, #VCPU_VFP_HOST
+   ldr r7, [r7]
+   restore_vfp_state r7
+
+   pop {r3-r7}
+   bx  lr
+ENDPROC(__kvm_restore_host_vfp_state)
 
 /
  *  Hypervisor world-switch code
@@ -131,7 +146,13 @@ ENTRY(__kvm_vcpu_run)
 
@ Trap coprocessor CRx accesses
set_hstr vmentry
+
+   ldr r1, [vcpu, #VCPU_VFP_SAVED]
+   cmp r1, #1
+   beq skip_guest_vfp_trap
set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+skip_guest_vfp_trap:
+
set_hdcr vmentry
 
@ Write configured ID register into MIDR alias
@@ -173,18 +194,6 @@ __kvm_vcpu_return:
set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 
 #ifdef CONFIG_VFPv3
-   @ Save floating point registers we if let guest use them.
-   tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
-   bne after_vfp_restore
-
-   @ Switch VFP/NEON hardware state to the host's
-   add r7, vcpu, #VCPU_VFP_GUEST
-   store_vfp_state r7
-   add r7, vcpu, #VCPU_VFP_HOST
-   ldr r7, [r7]
-   restore_vfp_state r7
-
-after_vfp_restore:
@ Restore FPEXC_EN which we clobbered on entry
pop {r2}
VFPFMXR FPEXC, r2
@@ -363,10 +372,6 @@ hyp_hvc:
@ Check syndrome register
mrc p15, 4, r1, c5, c2, 0   @ HSR
lsr r0, r1, #HSR_EC_SHIFT
-#ifdef CONFIG_VFPv3
-   cmp r0, #HSR_EC_CP_0_13
-   beq switch_to_guest_vfp
-#endif
cmp r0, #HSR_EC_HVC
bne guest_trap  @ Not HVC instr.
 
@@ -380,7 +385,10 @@ hyp_hvc:
cmp r2, #0
bne guest_trap  @ Guest called HVC
 
-host_switch_to_hyp:
+   /*
+* Getting here means host called HVC, we shift parameters and branch
+* to Hyp function.
+*/
pop {r0, r1, r2}
 
/* Check for __hyp_get_vectors */
@@ -411,6 +419,10 @@ guest_trap:
 
@ Check if we need the fault information
lsr r1, r1, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+   cmp r1, #HSR_EC_CP_0_13
+   beq switch_to_guest_vfp
+#endif
cmp r1, #HSR_EC_IABT
mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
beq 2f
@@ -479,11 +491,12 @@ guest_trap:
  */
 #ifdef CONFIG_VFPv3
 switch_to_guest_vfp:
-   load_vcpu   @ Load VCPU pointer to r0
push{r3-r7}
 
@ NEON/VFP used.  Turn on VFP access.
set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
+   mov r1, #1
+   str r1, [vcpu, #VCPU_VFP_SAVED]
 
@ Switch VFP/NEON hardware state to the guest's
add r7, r0, #VCPU_VFP_HOST
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] arm: KVM: define headers and offsets to mange VFP state

2015-06-24 Thread Mario Smarduch
Define the required kvm_vcpu_arch fields, and offsets to manage VFP state. And
declary Hyp interface function to switch VFP registers.


Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/include/asm/kvm_asm.h  |1 +
 arch/arm/include/asm/kvm_host.h |3 +++
 arch/arm/kernel/asm-offsets.c   |1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 25410b2..08dda8c 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index d71607c..22cea72 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -111,6 +111,9 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
 
+   /* Track if VFP registers are occupied by Guest while in KVM host mode*/
+   u32 vfp_guest_saved;
+
/* Exception Information */
struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..35093d0 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -191,6 +191,7 @@ int main(void)
   DEFINE(VCPU_HPFAR,   offsetof(struct kvm_vcpu, arch.fault.hpfar));
   DEFINE(VCPU_HYP_PC,  offsetof(struct kvm_vcpu, arch.fault.hyp_pc));
   DEFINE(VCPU_VGIC_CPU,offsetof(struct kvm_vcpu, 
arch.vgic_cpu));
+  DEFINE(VCPU_VFP_SAVED,   offsetof(struct kvm_vcpu, 
arch.vfp_guest_saved));
   DEFINE(VGIC_V2_CPU_HCR,  offsetof(struct vgic_cpu, vgic_v2.vgic_hcr));
   DEFINE(VGIC_V2_CPU_VMCR, offsetof(struct vgic_cpu, vgic_v2.vgic_vmcr));
   DEFINE(VGIC_V2_CPU_MISR, offsetof(struct vgic_cpu, vgic_v2.vgic_misr));
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] arm: KVM: VFP lazy switch in KVM Host Mode may save upto 98%

2015-06-24 Thread Mario Smarduch
Currently we do a lazy VFP switch in Hyp mode, but once we exit and re-enter hyp
mode we trap again on VFP access. This mode has shown around 30-50% improvement
running hackbench and lmbench.

This patch series extends lazy VFP switch beyond Hyp mode to KVM host mode.

1 - On guest access we switch from host to guest and set a flag accessible to 
host
2 - On exit to KVM host, VFP state is restored on vcpu_put if flag is marked (1)
3 - Otherwise guest is resumed and continues to use its VFP registers. 
4 - In case of 2 on VM entry we set VFP trap flag to repeat 1.

If guest does not access VFP registers them implemenation remains the same.

Executing hackbench on Fast Models and Exynos arm32 board shows good
results. Considering all exits 2% of the time KVM host lazy vfp switch is 
invoked.

Howeverr this patch set requires more burn in time and testing under various 
loads.

Currently ARM32 is addressed later ARM64.

Mario Smarduch (3):
  define headers and offsets to mange VFP state
  Implement lazy VFP switching outside of Hyp Mode
  Add VFP lazy switch hooks in Host KVM

 arch/arm/include/asm/kvm_asm.h  |1 +
 arch/arm/include/asm/kvm_host.h |3 +++
 arch/arm/kernel/asm-offsets.c   |1 +
 arch/arm/kvm/arm.c  |   15 
 arch/arm/kvm/interrupts.S   |   49 +--
 5 files changed, 51 insertions(+), 18 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] arm: KVM: Add VFP lazy switch hooks in Host KVM

2015-06-24 Thread Mario Smarduch
This patch implements host KVM interface to Hyp mode VFP function to 
switch out guest and switch in host.

Signed-off-by: Mario Smarduch m.smard...@samsung.com
---
 arch/arm/kvm/arm.c |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index d9631ec..77b41f5 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -105,6 +105,17 @@ void kvm_arch_check_processor_compat(void *rtn)
*(int *)rtn = 0;
 }
 
+/**
+ * kvm_switch_vp_regs() - switch guest/host VFP registers
+ * @vcpu:  pointer to vcpu structure.
+ *
+ * HYP interface functions to save guest and restore host VFP registers
+ */
+static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
+{
+   if (vcpu-arch.vfp_guest_saved == 1)
+   kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
+}
 
 /**
  * kvm_arch_init_vm - initializes a VM data structure
@@ -292,6 +303,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+
+   /* Check if Guest accessed VFP registers */
+   kvm_switch_fp_regs(vcpu);
+
/*
 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 * if the vcpu is no longer assigned to a cpu.  This is used for the
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] arm: KVM: keep arm vfp/simd exit handling consistent with arm64

2015-06-18 Thread Mario Smarduch
On 06/18/2015 10:27 AM, Marc Zyngier wrote:
 On 16/06/15 22:50, Mario Smarduch wrote:
 After enhancing arm64 FP/SIMD exit handling, FP/SIMD exit branch is moved
 to guest trap handling. This keeps exiting handling flow between both
 architectures consistent.

 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm/kvm/interrupts.S |   12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

 diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
 index 79caf79..fca2c56 100644
 --- a/arch/arm/kvm/interrupts.S
 +++ b/arch/arm/kvm/interrupts.S
 @@ -363,10 +363,6 @@ hyp_hvc:
  @ Check syndrome register
  mrc p15, 4, r1, c5, c2, 0   @ HSR
  lsr r0, r1, #HSR_EC_SHIFT
 -#ifdef CONFIG_VFPv3
 -cmp r0, #HSR_EC_CP_0_13
 -beq switch_to_guest_vfp
 -#endif
  cmp r0, #HSR_EC_HVC
  bne guest_trap  @ Not HVC instr.
  
 @@ -406,6 +402,12 @@ THUMB(  orr lr, #1)
  1:  eret
  
  guest_trap:
 +#ifdef CONFIG_VFPv3
 +/* Guest accessed VFP/SIMD registers, save host, restore Guest */
 +cmp r0, #HSR_EC_CP_0_13
 +beq switch_to_guest_fpsimd
 +#endif
 +
  load_vcpu   @ Load VCPU pointer to r0
  str r1, [vcpu, #VCPU_HSR]
  
 @@ -478,7 +480,7 @@ guest_trap:
   * inject an undefined exception to the guest.
   */
  #ifdef CONFIG_VFPv3
 -switch_to_guest_vfp:
 +switch_to_guest_fpsimd:
 
 Ah, I think I managed to confuse you in my previous comment.
 On ARMv7, we call the floating point stuff VFP.
 On ARMv8, we call it FP/SIMD.

Ah I see, I'll update.
 
 Not very consistent, I know...
 
  load_vcpu   @ Load VCPU pointer to r0

How about move it here - then it does not stick out like
before.

guest_trap:
load_vcpu   @ Load VCPU pointer to r0
str r1, [vcpu, #VCPU_HSR]

@ Check if we need the fault information
lsr r1, r1, #HSR_EC_SHIFT
#ifdef CONFIG_VFPv3
/* Guest accessed VFP/SIMD registers, save host, restore Guest */
cmp r1, #HSR_EC_CP_0_13
beq switch_to_guest_vfp
#endif


Regarding host_switch_to_hyp: it has no reference but appears
like a clean separator, that's on purpose?

Thanks

 
 It would be interesting to find out if we can make this load_vcpu part
 of the common sequence (without spilling another register, of course).
 Probably involves moving the exception class to r2.
 
 Thanks,
 
   M.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 32-bit fp/simd race - never mind :)

2015-06-17 Thread Mario Smarduch
I have been looking at it for too long, my concepts
got twisted.

On 06/17/2015 07:56 PM, Mario Smarduch wrote:
 Maybe I've been looking at this code too long, but it
 appears that on __kvm_vcpu_return we save/restore
 fp/simd registers and then change to hyp role. In
 between if we get an interrupt vCPU may be migrated
 to another CPU? Or am I missing something?
 
 Thanks,
 - Mario
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   >