On Tue, Nov 20, 2018 at 12:25:12PM +0000, Alex Bennée wrote:
>
> Dave Martin <[email protected]> writes:
>
> > On Mon, Nov 19, 2018 at 04:36:01PM +0000, Alex Bennée wrote:
> >>
> >> Dave Martin <[email protected]> writes:
> >>
> >> > In order to give each vcpu its own view of the SVE registers, this
> >> > patch adds context storage via a new sve_state pointer in struct
> >> > vcpu_arch. An additional member sve_max_vl is also added for each
> >> > vcpu, to determine the maximum vector length visible to the guest
> >> > and thus the value to be configured in ZCR_EL2.LEN while the is
> >> > active. This also determines the layout and size of the storage in
> >> > sve_state, which is read and written by the same backend functions
> >> > that are used for context-switching the SVE state for host tasks.
> >> >
> >> > On SVE-enabled vcpus, SVE access traps are now handled by switching
> >> > in the vcpu's SVE context and disabling the trap before returning
> >> > to the guest. On other vcpus, the trap is not handled and an exit
> >> > back to the host occurs, where the handle_sve() fallback path
> >> > reflects an undefined instruction exception back to the guest,
> >> > consistently with the behaviour of non-SVE-capable hardware (as was
> >> > done unconditionally prior to this patch).
> >> >
> >> > No SVE handling is added on non-VHE-only paths, since VHE is an
> >> > architectural and Kconfig prerequisite of SVE.
> >> >
> >> > Signed-off-by: Dave Martin <[email protected]>
> >
> > [...]
> >
> >> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >> > index 085ed06..9941349 100644
> >> > --- a/arch/arm64/kvm/hyp/switch.c
> >> > +++ b/arch/arm64/kvm/hyp/switch.c
> >
> > [...]
> >
> >> > @@ -380,6 +398,26 @@ static bool __hyp_text __hyp_switch_fpsimd(struct
> >> > kvm_vcpu *vcpu)
> >> > return true;
> >> > }
> >> >
> >> > +static inline bool __hyp_text __hyp_trap_is_fpsimd(struct kvm_vcpu
> >> > *vcpu,
> >> > + bool guest_has_sve)
> >> > +{
> >> > +
> >> > + u8 trap_class;
> >> > +
> >> > + if (!system_supports_fpsimd())
> >> > + return false;
> >> > +
> >> > + trap_class = kvm_vcpu_trap_get_class(vcpu);
> >> > +
> >> > + if (trap_class == ESR_ELx_EC_FP_ASIMD)
> >> > + return true;
> >> > +
> >> > + if_sve (guest_has_sve && trap_class == ESR_ELx_EC_SVE)
> >> > + return true;
> >>
> >> Do we really need to check the guest has SVE before believing what the
> >> hardware is telling us? According to the ARM ARM:
> >>
> >> For ESR_ELx_EC_FP_ASIMD
> >>
> >> Excludes exceptions resulting from CPACR_EL1 when the value of
> >> HCR_EL2.TGE is
> >> 1, or because SVE or Advanced SIMD and floating-point are not
> >> implemented. These
> >> are reported with EC value 0b000000
> >>
> >> But also for ESR_ELx_EC_SVE
> >>
> >> Access to SVE functionality trapped as a result of CPACR_EL1.ZEN,
> >> CPTR_EL2.ZEN, CPTR_EL2.TZ, or CPTR_EL3.EZ, that is not reported using EC
> >> 0b000000. This EC is defined only if SVE is implemented
> >>
> >> Given I got confused maybe we need a comment for clarity?
> >
> > This is not about not trusting the value ESR_ELx_EC_SVE on older
> > hardware: in effect it is retrospectively reserved for this purpose on
> > all older arch versions, so there is no ambiguity about what it means.
> > It should never be observed on hardware that doesn't have SVE.
> >
> > Rather, how we handle this trap differs depending on whether the guest
> > is SVE-enabled or not. If not, then this trap is handled by the generic
> > fallback path for unhandled guest traps, so we don't check for this
> > particular EC value explicitly in that case.
> >
> >> /* Catch guests without SVE enabled running on SVE capable hardware */
> >
> > I might write something like:
> >
> > /*
> > * For sve-enmabled guests only, handle SVE access via FPSIMD
> > * context handling code.
> > */
> >
> > Does that make sense? I may have misunderstood your concern here.
>
> s/enmabled/enabled/ but yeah that's fine.
Well spotted... I guess I was in a hurry.
> > [...]
> >
> >> > @@ -387,6 +425,8 @@ static bool __hyp_text __hyp_switch_fpsimd(struct
> >> > kvm_vcpu *vcpu)
> >> > */
> >> > static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64
> >> > *exit_code)
> >> > {
> >> > + bool guest_has_sve;
> >> > +
> >> > if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
> >> > vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
> >> >
> >> > @@ -404,10 +444,11 @@ static bool __hyp_text fixup_guest_exit(struct
> >> > kvm_vcpu *vcpu, u64 *exit_code)
> >> > * and restore the guest context lazily.
> >> > * If FP/SIMD is not implemented, handle the trap and inject an
> >> > * undefined instruction exception to the guest.
> >> > + * Similarly for trapped SVE accesses.
> >> > */
> >> > - if (system_supports_fpsimd() &&
> >> > - kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD)
> >> > - return __hyp_switch_fpsimd(vcpu);
> >> > + guest_has_sve = vcpu_has_sve(vcpu);
> >>
> >> I'm not sure if it's worth fishing this out here given you are already
> >> passing vcpu down the chain.
> >
> > I wanted to discourage GCC from recomputing this. If you're in a
> > position to do so, can you look at the disassembly with/without this
> > factored out and see whether it makes a difference?
>
> Hmm it is hard to tell. There is code motion but for some reason I'm
> seeing the static jump code unrolled, for example (original on left):
>
> __hyp_switch_fpsimd():
> __hyp_switch_fpsimd():
> /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:382
> | /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:381
>
> > ----: tst w0, #0x400000
>
> > ----: b.eq 22c <fixup_guest_exit+0x1a4> // b.none
>
> > arch_static_branch_jump():
>
> > /home/alex/lsrc/kvm/linux.git/arch/arm64/include/asm/jump_label.h:45
>
> > ----: b 38c <fixup_guest_exit+0x304>
>
> > arch_static_branch():
>
> > /home/alex/lsrc/kvm/linux.git/arch/arm64/include/asm/jump_label.h:31
>
> > ----: nop
>
> > ----: b 22c <fixup_guest_exit+0x1a4>
>
> > test_bit():
>
> >
> /home/alex/lsrc/kvm/linux.git/include/asm-generic/bitops/non-atomic.h:106
>
> > ----: adrp x0, 0 <cpu_hwcaps>
>
> > ----: ldr x0, [x0]
>
> > __hyp_switch_fpsimd():
>
> > /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:381
> ----: tst w0, #0x400000
> ----: tst w0, #0x400000
> ----: b.eq 238 <fixup_guest_exit+0x1b0> // b.none
> | ----: b.eq 22c <fixup_guest_exit+0x1a4> // b.none
> ----: cbz w21, 238 <fixup_guest_exit+0x1b0>
> | ----: tbz w2, #5, 22c <fixup_guest_exit+0x1a4>
> /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:383
> | /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:382
> ----: ldr w2, [x19, #2040]
> | ----: ldr w2, [x20, #2040]
> ----: add x1, x19, #0x4b0
> | ----: add x1, x20, #0x4b0
> ----: ldr x0, [x19, #2032]
> | ----: ldr x0, [x20, #2032]
> sve_ffr_offset():
> sve_ffr_offset():
>
> Put calculating guest_has_sve at the top of __hyp_switch_fpsimd make
> most of that go away and just moves things around a little bit. So I
> guess it could makes sense for the fast(ish) path although I'd be
> interested in knowing if it made any real difference to the numbers.
> After all the first read should be well cached and moving it through the
> stack is just additional memory and register pressure.
Hmmm, I will have a think about this when I respin.
Explicitly caching guest_has_sve() does reduce the compiler's freedom to
optimise.
We might be able to mark it as __pure or __attribute_const__ to enable
the compiler to decide whether to cache the result, but this may not be
100% safe.
Part of me would prefer to leave things as they are to avoid the risk of
breaking the code again...
Cheers
---Dave
_______________________________________________
kvmarm mailing list
[email protected]
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm