Dave Martin <[email protected]> writes:
<snip>
>> >> > @@ -404,10 +444,11 @@ static bool __hyp_text fixup_guest_exit(struct
>> >> > kvm_vcpu *vcpu, u64 *exit_code)
>> >> > * and restore the guest context lazily.
>> >> > * If FP/SIMD is not implemented, handle the trap and inject an
>> >> > * undefined instruction exception to the guest.
>> >> > + * Similarly for trapped SVE accesses.
>> >> > */
>> >> > - if (system_supports_fpsimd() &&
>> >> > - kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD)
>> >> > - return __hyp_switch_fpsimd(vcpu);
>> >> > + guest_has_sve = vcpu_has_sve(vcpu);
>> >>
>> >> I'm not sure if it's worth fishing this out here given you are already
>> >> passing vcpu down the chain.
>> >
>> > I wanted to discourage GCC from recomputing this. If you're in a
>> > position to do so, can you look at the disassembly with/without this
>> > factored out and see whether it makes a difference?
>>
>> Hmm it is hard to tell. There is code motion but for some reason I'm
>> seeing the static jump code unrolled, for example (original on left):
>>
>> __hyp_switch_fpsimd():
>> __hyp_switch_fpsimd():
>> /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:382
>> | /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:381
>>
>> > ----: tst w0, #0x400000
>>
>> > ----: b.eq 22c <fixup_guest_exit+0x1a4> // b.none
>>
>> > arch_static_branch_jump():
>>
>> >
>> /home/alex/lsrc/kvm/linux.git/arch/arm64/include/asm/jump_label.h:45
>>
>> > ----: b 38c <fixup_guest_exit+0x304>
>>
>> > arch_static_branch():
>>
>> >
>> /home/alex/lsrc/kvm/linux.git/arch/arm64/include/asm/jump_label.h:31
>>
>> > ----: nop
>>
>> > ----: b 22c <fixup_guest_exit+0x1a4>
>>
>> > test_bit():
>>
>> >
>> /home/alex/lsrc/kvm/linux.git/include/asm-generic/bitops/non-atomic.h:106
>>
>> > ----: adrp x0, 0 <cpu_hwcaps>
>>
>> > ----: ldr x0, [x0]
>>
>> > __hyp_switch_fpsimd():
>>
>> > /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:381
>> ----: tst w0, #0x400000
>> ----: tst w0, #0x400000
>> ----: b.eq 238 <fixup_guest_exit+0x1b0> // b.none
>> | ----: b.eq 22c <fixup_guest_exit+0x1a4> // b.none
>> ----: cbz w21, 238 <fixup_guest_exit+0x1b0>
>> | ----: tbz w2, #5, 22c <fixup_guest_exit+0x1a4>
>> /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:383
>> | /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:382
>> ----: ldr w2, [x19, #2040]
>> | ----: ldr w2, [x20, #2040]
>> ----: add x1, x19, #0x4b0
>> | ----: add x1, x20, #0x4b0
>> ----: ldr x0, [x19, #2032]
>> | ----: ldr x0, [x20, #2032]
>> sve_ffr_offset():
>> sve_ffr_offset():
>>
>> Put calculating guest_has_sve at the top of __hyp_switch_fpsimd make
>> most of that go away and just moves things around a little bit. So I
>> guess it could makes sense for the fast(ish) path although I'd be
>> interested in knowing if it made any real difference to the numbers.
>> After all the first read should be well cached and moving it through the
>> stack is just additional memory and register pressure.
>
> Hmmm, I will have a think about this when I respin.
>
> Explicitly caching guest_has_sve() does reduce the compiler's freedom to
> optimise.
>
> We might be able to mark it as __pure or __attribute_const__ to enable
> the compiler to decide whether to cache the result, but this may not be
> 100% safe.
>
> Part of me would prefer to leave things as they are to avoid the risk of
> breaking the code again...
Given that the only place you call __hyp_switch_fpsimd is here you could
just roll in into __hyp_trap_is_fpsimd and have:
if (__hyp_trap_is_fpsimd(vcpu))
return true;
--
Alex Bennée
_______________________________________________
kvmarm mailing list
[email protected]
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm