On 09/04/18 10:44, Christoffer Dall wrote:
> On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
>> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
>>> Hi Dave,
>>>
>>> On 06/04/18 16:01, Dave Martin wrote:
>>>> To make the lazy FPSIMD context switch trap code easier to hack on,
>>>> this patch converts it to C.
>>>>
>>>> This is not amazingly efficient, but the trap should typically only
>>>> be taken once per host context switch.
>>>>
>>>> Signed-off-by: Dave Martin <dave.mar...@arm.com>
>>>>
>>>> ---
>>>>
>>>> Since RFCv1:
>>>>
>>>>  * Fix indentation to be consistent with the rest of the file.
>>>>  * Add missing ! to write back to sp with attempting to push regs.
>>>> ---
>>>>  arch/arm64/kvm/hyp/entry.S  | 57 
>>>> +++++++++++++++++----------------------------
>>>>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>>>>  2 files changed, 46 insertions(+), 35 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>>>> index fdd1068..47c6a78 100644
>>>> --- a/arch/arm64/kvm/hyp/entry.S
>>>> +++ b/arch/arm64/kvm/hyp/entry.S
>>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
>>>>    // x1: vcpu
>>>>    // x2-x29,lr: vcpu regs
>>>>    // vcpu x0-x1 on the stack
>>>> -  stp     x2, x3, [sp, #-16]!
>>>> -  stp     x4, lr, [sp, #-16]!
>>>> -
>>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
>>>> -  mrs     x2, cptr_el2
>>>> -  bic     x2, x2, #CPTR_EL2_TFP
>>>> -  msr     cptr_el2, x2
>>>> -alternative_else
>>>> -  mrs     x2, cpacr_el1
>>>> -  orr     x2, x2, #CPACR_EL1_FPEN
>>>> -  msr     cpacr_el1, x2
>>>> -alternative_endif
>>>> -  isb
>>>> -
>>>> -  mov     x3, x1
>>>> -
>>>> -  ldr     x0, [x3, #VCPU_HOST_CONTEXT]
>>>> -  kern_hyp_va x0
>>>> -  add     x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -  bl      __fpsimd_save_state
>>>> -
>>>> -  add     x2, x3, #VCPU_CONTEXT
>>>> -  add     x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -  bl      __fpsimd_restore_state
>>>> -
>>>> -  // Skip restoring fpexc32 for AArch64 guests
>>>> -  mrs     x1, hcr_el2
>>>> -  tbnz    x1, #HCR_RW_SHIFT, 1f
>>>> -  ldr     x4, [x3, #VCPU_FPEXC32_EL2]
>>>> -  msr     fpexc32_el2, x4
>>>> -1:
>>>> -  ldp     x4, lr, [sp], #16
>>>> -  ldp     x2, x3, [sp], #16
>>>> -  ldp     x0, x1, [sp], #16
>>>> -
>>>> +  stp     x2, x3, [sp, #-144]!
>>>> +  stp     x4, x5, [sp, #16]
>>>> +  stp     x6, x7, [sp, #32]
>>>> +  stp     x8, x9, [sp, #48]
>>>> +  stp     x10, x11, [sp, #64]
>>>> +  stp     x12, x13, [sp, #80]
>>>> +  stp     x14, x15, [sp, #96]
>>>> +  stp     x16, x17, [sp, #112]
>>>> +  stp     x18, lr, [sp, #128]
>>>> +
>>>> +  bl      __hyp_switch_fpsimd
>>>> +
>>>> +  ldp     x4, x5, [sp, #16]
>>>> +  ldp     x6, x7, [sp, #32]
>>>> +  ldp     x8, x9, [sp, #48]
>>>> +  ldp     x10, x11, [sp, #64]
>>>> +  ldp     x12, x13, [sp, #80]
>>>> +  ldp     x14, x15, [sp, #96]
>>>> +  ldp     x16, x17, [sp, #112]
>>>> +  ldp     x18, lr, [sp, #128]
>>>> +  ldp     x0, x1, [sp, #144]
>>>> +  ldp     x2, x3, [sp], #160
>>>
>>> I can't say I'm overly thrilled with adding another save/restore 
>>> sequence. How about treating it like a real guest exit instead? Granted, 
>>> there is a bit more overhead to it, but as you pointed out above, this 
>>> should be pretty rare...
>>
>> I have no objection to handling this after exiting back to
>> __kvm_vcpu_run(), provided the performance is deemed acceptable.
>>
> 
> My guess is that it's going to be visible on non-VHE systems, and given
> that we're doing all of this for performance in the first place, I'm not
> exceited about that approach either.

My rational is that, as we don't disable FP access across most
exit/entry sequences, we still significantly benefit from the optimization.

> I thought it was acceptable to do another save/restore, because it was
> only the GPRs (and equivalent to what the compiler would generate for a
> function call?) and thus not susceptible to the complexities of sysreg
> save/restores.

Sysreg? That's not what I'm proposing. What I'm proposing here is that
we treat FP exception as a shallow exit that immediately returns to the
guest without touching them. The overhead is an extra save/restore of
the host's x19-x30, if I got my maths right. I agree that this is
significant, but I'd like to measure this overhead before we go one way
or the other.

> Another alternative would be to go back to Dave's original approach of
> implementing the fpsimd state update to the host's structure in assembly
> directly, but I was having a hard time understanding that.  Perhaps I
> just need to try harder.
I'd rather stick to the current C approach, no matter how we perform the
save/restore. It feels a lot more readable and maintainable in the long run.

Thanks,

        M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reply via email to