On Sun, May 20, 2018 at 19:04 PM GST+8, Marc Zyngier wrote: > On Sat, 19 May 2018 02:38:33 +0000 > "Tangnianyao (ICT)" <[email protected]> wrote: > > > On Wed, May 16, 2018 at 20:46 PM GST+8, Christoffer Dall wrote: > > > On Wed, May 16, 2018 at 11:32:17AM +0100, Dave Martin wrote: > > > > On Wed, May 16, 2018 at 10:25:40AM +0100, Marc Zyngier wrote: > > > > > [+Dave] > > > > > > > > > > Hi Nianyao, > > > > > > > > > > On 16/05/18 10:08, Tangnianyao (ICT) wrote: > > > > > > Add last_fpsimd_trap to notify if guest last exit reason is > > > > > > handling fpsimd. If guest is using fpsimd frequently, save host's > > > > > > fpsimd state and restore guest's fpsimd state and deactive fpsimd > > > > > > trap before returning to guest. It can avoid guest fpsimd trap soon > > > > > > to improve performance. > > > > > > > > So, the purpose of this patch is to context switch the FPSIMD > > > > state on initial entry to the guest, instead of enabling the trap > > > > and context switching the FPSIMD state lazily? > > > > > > > > And you decide whether to do this or not, based on whether the guest > > > triggered a lazy FPSIMD switch previously? > > > > I try to detect guest using fpsimd , but not overtraining which may > > include more complexity and not suitable for common cases. Therefore I > > just decide whether guest have triggerd a lazy fpsimd switch last time. > > > > > > > > > > > > > > > > > > Signed-off-by: Nianyao Tang <[email protected]> > > > > > > --- > > > > > > arch/arm64/kernel/asm-offsets.c | 1 + > > > > > > arch/arm64/kvm/hyp/entry.S | 5 +++++ > > > > > > arch/arm64/kvm/hyp/switch.c | 38 > > > > > > ++++++++++++++++++++++++++++++++++++-- > > > > > > include/linux/kvm_host.h | 1 + > > > > > > 4 files changed, 43 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/kernel/asm-offsets.c > > > > > > b/arch/arm64/kernel/asm-offsets.c index 5bdda65..35a9c5c > > > > > > 100644 > > > > > > --- a/arch/arm64/kernel/asm-offsets.c > > > > > > +++ b/arch/arm64/kernel/asm-offsets.c > > > > > > @@ -136,6 +136,7 @@ int main(void) #ifdef CONFIG_KVM_ARM_HOST > > > > > > DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, > > > > > > arch.ctxt)); > > > > > > DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, > > > > > > arch.fault.disr_el1)); > > > > > > + DEFINE(VCPU_LAST_FPSIMD_TRAP, offsetof(struct kvm_vcpu, > > > > > > + last_fpsimd_trap)); > > > > > > DEFINE(CPU_GP_REGS, offsetof(struct > > > > > > kvm_cpu_context, gp_regs)); > > > > > > DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs)); > > > > > > DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, > > > > > > fp_regs)); > > > > > > diff --git a/arch/arm64/kvm/hyp/entry.S > > > > > > b/arch/arm64/kvm/hyp/entry.S index e41a161..956e042 100644 > > > > > > --- a/arch/arm64/kvm/hyp/entry.S > > > > > > +++ b/arch/arm64/kvm/hyp/entry.S > > > > > > @@ -197,6 +197,11 @@ alternative_endif > > > > > > add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS) > > > > > > bl __fpsimd_restore_state > > > > > > > > > > > > + // Mark guest using fpsimd now > > > > > > + ldr x0, [x3, #VCPU_LAST_FPSIMD_TRAP] > > > > > > + add x0, x0, #1 > > > > > > > > Can this overflow? > > > > It won't overflow. It only trigger one fpsimd trap and restore guest's > > fpsimd state and deactivate fpsimd trap. When next non-fpsimd trap comes, > > it will be clear unconditionally. > > > > > > > > > > > > + str x0, [x3, #VCPU_LAST_FPSIMD_TRAP] > > > > > > + > > > > > > // Skip restoring fpexc32 for AArch64 guests > > > > > > mrs x1, hcr_el2 > > > > > > tbnz x1, #HCR_RW_SHIFT, 1f > > > > > > diff --git a/arch/arm64/kvm/hyp/switch.c > > > > > > b/arch/arm64/kvm/hyp/switch.c index d964523..86eea1b 100644 > > > > > > --- a/arch/arm64/kvm/hyp/switch.c > > > > > > +++ b/arch/arm64/kvm/hyp/switch.c > > > > > > @@ -92,7 +92,13 @@ static void activate_traps_vhe(struct > > > > > > kvm_vcpu > > > > > > *vcpu) > > > > > > > > > > > > val = read_sysreg(cpacr_el1); > > > > > > val |= CPACR_EL1_TTA; > > > > > > - val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN); > > > > > > + val &= ~CPACR_EL1_ZEN; > > > > > > + > > > > > > + if (vcpu->last_fpsimd_trap) > > > > > > + val |= CPACR_EL1_FPEN; > > > > > > + else > > > > > > + val &= ~CPACR_EL1_FPEN; > > > > > > + > > > > > > write_sysreg(val, cpacr_el1); > > > > > > > > > > > > write_sysreg(kvm_get_hyp_vector(), vbar_el1); @@ -105,7 +111,13 > > > > > > @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu > > > > > > *vcpu) > > > > > > __activate_traps_common(vcpu); > > > > > > > > > > > > val = CPTR_EL2_DEFAULT; > > > > > > - val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ; > > > > > > + val |= CPTR_EL2_TTA | CPTR_EL2_TZ; > > > > > > + > > > > > > + if (vcpu->last_fpsimd_trap) > > > > > > + val &= ~CPTR_EL2_TFP; > > > > > > + else > > > > > > + val |= CPTR_EL2_TFP; > > > > > > + > > > > > > write_sysreg(val, cptr_el2); } > > > > > > > > > > > > @@ -406,6 +418,17 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) > > > > > > __activate_traps(vcpu); > > > > > > __activate_vm(vcpu->kvm); > > > > > > > > > > > > + /* > > > > > > + * If guest last trap to host for handling fpsimd, > > > > > > last_fpsimd_trap > > > > > > + * is set. Restore guest's fpsimd state and deactivate fpsimd > > > > > > trap > > > > > > + * to avoid guest traping soon. > > > > > > + */ > > > > > > + if (vcpu->last_fpsimd_trap) { > > > > > > + __fpsimd_save_state(&host_ctxt->gp_regs.fp_regs); > > > > > > + __fpsimd_restore_state(&guest_ctxt->gp_regs.fp_regs); > > > > > > + vcpu->last_fpsimd_trap = 0; > > > > > > + } > > > > > > + > > > > > > sysreg_restore_guest_state_vhe(guest_ctxt); > > > > > > __debug_switch_to_guest(vcpu); > > > > > > > > > > > > @@ -454,6 +477,17 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct > > > > > > kvm_vcpu *vcpu) > > > > > > __activate_traps(vcpu); > > > > > > __activate_vm(kern_hyp_va(vcpu->kvm)); > > > > > > > > > > > > + /* > > > > > > + * If guest last trap to host for handling fpsimd, > > > > > > last_fpsimd_trap > > > > > > + * is set. Restore guest's fpsimd state and deactivate fpsimd > > > > > > trap > > > > > > + * to avoid guest traping soon. > > > > > > + */ > > > > > > + if (vcpu->last_fpsimd_trap) { > > > > > > + __fpsimd_save_state(&host_ctxt->gp_regs.fp_regs); > > > > > > + __fpsimd_restore_state(&guest_ctxt->gp_regs.fp_regs); > > > > > > + vcpu->last_fpsimd_trap = 0; > > > > > > + } > > > > > > + > > > > > > __hyp_vgic_restore_state(vcpu); > > > > > > __timer_enable_traps(vcpu); > > > > > > > > > > > > diff --git a/include/linux/kvm_host.h > > > > > > b/include/linux/kvm_host.h index 6930c63..46bdf0d 100644 > > > > > > --- a/include/linux/kvm_host.h > > > > > > +++ b/include/linux/kvm_host.h > > > > > > @@ -274,6 +274,7 @@ struct kvm_vcpu { > > > > > > bool preempted; > > > > > > struct kvm_vcpu_arch arch; > > > > > > struct dentry *debugfs_dentry; > > > > > > + unsigned int last_fpsimd_trap; > > > > > > }; > > > > > > > > > > > > static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu > > > > > > *vcpu) > > > > > > -- > > > > > > 2.7.4 > > > > > > > > > > > > > > > > This doesn't seem to be the 100% correct. I can't see how this > > > > > works when being preempted, for example... I suggest you look at > > > > > Dave Martin's series[1], which does this correctly. > > > > > > > > If I've understood correctly, this chooses between eager and lazy > > > > switching, which is addressing a different issue from [1]. > > > > > > > > In effect, this code is attempting to predict whether the guest > > > > will use FPSIMD before the next exit. If the prediction is > > > > correct and the guest does use FPSIMD, then the overhead of the > > > > lazy FPSIMD trap disappears. This is probably a good thing, > > > > though it may increase interrupt latency a little. However, if > > > > the prediction is wrong and the guest doesn't use FPSIMD before > > > > the next exit, then the overhead of guest entry increases for no > > > > benefit, because FPSIMD was switched unnecessarily. > > > > > > > > Do you have any benchmarks or metrics on the accuracy of the > > > > prediction and the overall impact on performance? > > > > > > > > The changes in [1] should reduce the number of FPSIMD context > > > > switches overall, so may reduce the benefit of this patch. > > > > > > > > This idea may be different. It tries to detect guest using fpsimd, do > > eager switching ,and reduce guest trap. [1] reduces fpsimd context > > switches in host when it has already trapped. > > > > > > > > > > > Based on previous measurements [2], I found that only in 29% of the times > > > when we return to userspace or are preempted, has the VCPU touched the > > > FPSIMD registers, and therefore a significant majority of the times we > > > enter the guest, the guest doesn't touch the FPSIMD state, even if it has > > > touched it before. > > > > > > As Dave suggests, if there is data that shows that an immediately > > > following entry from an exit that had VFP enabled implies with a very > > > high probablity that the guest will touch VFP again, then something like > > > this might make sense, but I'd like to see some data to back up this > > > hypothesis before adding additional complexity to the code. > > > > > > [2]: > > > https://lists.cs.columbia.edu/pipermail/kvmarm/2018-February/029835. > > > html > > > > I try to detect guest using fpsimd, and choose eager switching to avoid > > trapping to hyp soon. > > The reason is, in some micro architecture, vmid switch will flush all > > L1 TLB entry which doesn't contain vmid. It will result in extra > > 2-stage table walk when returning to guest, which could be avoid if not > > trap to el2. As I know, Maia has this issue. > > This looks extremely microarchitectural. Have you tested how this behaves on > other implementations?
I have only tested on Maia. As I know in research, both Maia and A72 behave in this way, so it may be a common problem. > > > I have done some benchmark test on guest and find that, frequent > > context switch , which save and restore fpsimd frequently, resulted in > > extra cost compared to host context switch because of fpsimd trap. In > > some benchmarks, like lat_ctx and lat_udp, one asid usually got stuck and > > scheduled to another, where context switch and handling fpsimd happened > > very frequently. > > In my performance test, lat_ctx with 4 asid switch. In comparison > > between guest and host, without this, guest is 10% worse than host. With > > this, guest is only 8% worse than host. > > > > Detect guest using fpsimd based on lazy fpsimd switch may be not 100% > > correct. > > I think accurate detection may be overtraining and is not suitable for > > common cases. > > Therefore I just do it in simply way to reduce some fpsimd traps. > > Can you please with Dave's series (which I have now merged for 4.18) and let > us know if you're still seeing this overhead? If you're still seeing it, we > can then look into it. Ok, I will seeing this based on 4.18. Thanks, -Nianyao Tang _______________________________________________ kvmarm mailing list [email protected] https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
