Re: [PATCH v2 4/6] inmates: x86: discover and activate SSE/AVX

Ralf Ramsauer Fri, 07 Jun 2019 08:27:12 -0700

On 5/31/19 8:39 PM, Ralf Ramsauer wrote:
> 
> 
> On 5/31/19 12:04 PM, Jan Kiszka wrote:
>> On 31.05.19 17:52, Ralf Ramsauer wrote:
>>> On 5/30/19 3:00 AM, Jan Kiszka wrote:
>>>> On 21.05.19 16:34, Ralf Ramsauer wrote:
>>>>> Recent gcc versions emit SSE instructions for 32-bit inmates (e.g., in
>>>>> hex2str or cmdline_parse routines). Inmates aren't able to execute
>>>>> those
>>>>> instructions as SSE is not enabled and will crash.
>>>>>
>>>>> Enabling SSE is the same code for 32 and 64 bit x86 and straight
>>>>> forward: Lookup SSE availability via cpuid and enable OSFXSR in cr4. If
>>>>> SSE is not available, stop the inmate.
>>>>>
>>>>> If AVX is available, activate it (XCR0).
>>>>>
>>>>> Lookup features that need no explicit activation.
>>>>>
>>>>> Signed-off-by: Ralf Ramsauer <[email protected]>
>>>>> ---
>>>>>    inmates/lib/x86/Makefile           |   6 +-
>>>>>    inmates/lib/x86/cpu-features.c     | 103
>>>>> +++++++++++++++++++++++++++++
>>>>>    inmates/lib/x86/header-32.S        |   9 ++-
>>>>>    inmates/lib/x86/header-64.S        |   9 ++-
>>>>>    inmates/lib/x86/include/asm/regs.h |  36 ++++++++++
>>>>>    5 files changed, 158 insertions(+), 5 deletions(-)
>>>>>    create mode 100644 inmates/lib/x86/cpu-features.c
>>>>>
>>>>> diff --git a/inmates/lib/x86/Makefile b/inmates/lib/x86/Makefile
>>>>> index e474ffd0..ed3b04d5 100644
>>>>> --- a/inmates/lib/x86/Makefile
>>>>> +++ b/inmates/lib/x86/Makefile
>>>>> @@ -40,7 +40,7 @@ include $(INMATES_LIB)/Makefile.lib
>>>>>
>>>>>    always := lib.a lib32.a
>>>>>
>>>>> -TARGETS := header-common.o ioapic.o printk.o setup.o smp.o uart.o
>>>>> +TARGETS := cpu-features.o header-common.o ioapic.o printk.o setup.o
>>>>> smp.o uart.o
>>>>>    TARGETS += ../alloc.o ../pci.o ../string.o ../cmdline.o ../setup.o
>>>>>    TARGETS += ../uart-8250.o ../printk.o
>>>>>    TARGETS_32_ONLY := header-32.o
>>>>> @@ -57,6 +57,10 @@ $(obj)/lib32.a: $(addprefix $(obj)/,$(lib32-y))
>>>>>
>>>>>    targets += header-32.o
>>>>>
>>>>> +# Code of this object is called before SSE/AVX is available. Ensure
>>>>> that the
>>>>> +# compiler won't generate unsupported instructions for this file.
>>>>> +CFLAGS_cpu-features.o += -mno-sse
>>>>> +
>>>>>    $(obj)/%-32.o: c_flags += -m32
>>>>>    $(obj)/%-32.o: $(src)/%.c
>>>>>        $(call if_changed_rule,cc_o_c)
>>>>> diff --git a/inmates/lib/x86/cpu-features.c
>>>>> b/inmates/lib/x86/cpu-features.c
>>>>> new file mode 100644
>>>>> index 00000000..9cf98543
>>>>> --- /dev/null
>>>>> +++ b/inmates/lib/x86/cpu-features.c
>>>>> @@ -0,0 +1,103 @@
>>>>> +/*
>>>>> + * Jailhouse, a Linux-based partitioning hypervisor
>>>>> + *
>>>>> + * Copyright (c) OTH Regensburg, 2019
>>>>> + *
>>>>> + * Authors:
>>>>> + *  Ralf Ramsauer <[email protected]>
>>>>> + *
>>>>> + * This work is licensed under the terms of the GNU GPL, version
>>>>> 2.  See
>>>>> + * the COPYING file in the top-level directory.
>>>>> + *
>>>>> + * Alternatively, you can use or redistribute this file under the
>>>>> following
>>>>> + * BSD license:
>>>>> + *
>>>>> + * Redistribution and use in source and binary forms, with or without
>>>>> + * modification, are permitted provided that the following conditions
>>>>> + * are met:
>>>>> + *
>>>>> + * 1. Redistributions of source code must retain the above copyright
>>>>> + *    notice, this list of conditions and the following disclaimer.
>>>>> + *
>>>>> + * 2. Redistributions in binary form must reproduce the above
>>>>> copyright
>>>>> + *    notice, this list of conditions and the following disclaimer in
>>>>> the
>>>>> + *    documentation and/or other materials provided with the
>>>>> distribution.
>>>>> + *
>>>>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>>>>> CONTRIBUTORS "AS IS"
>>>>> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>>>>> TO, THE
>>>>> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
>>>>> PURPOSE
>>>>> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
>>>>> CONTRIBUTORS BE
>>>>> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>>>>> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
>>>>> PROCUREMENT OF
>>>>> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
>>>>> BUSINESS
>>>>> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
>>>>> WHETHER IN
>>>>> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
>>>>> OTHERWISE)
>>>>> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
>>>>> ADVISED OF
>>>>> + * THE POSSIBILITY OF SUCH DAMAGE.
>>>>> + */
>>>>> +
>>>>> +#include <inmate.h>
>>>>> +#include <asm/regs.h>
>>>>> +
>>>>> +/* Must only be called from assembler via jmp */
>>>>> +void arch_init_features(void);
>>>>> +
>>>>> +struct x86_cpu_features x86_cpu_features
>>>>> __attribute__((section(".data")));
>>>>> +
>>>>> +/*
>>>>> + * We arrive here very early, and we don't have a stack. Take care.
>>>>> + *
>>>>> + * Every booting CPU will call this function. We make the assumption
>>>>> that all
>>>>> + * CPUs have the same feature set. So we don't need any locks when
>>>>> writing to
>>>>> + * x86_cpu_features.
>>>
>>> Any comments on this? Is it okay that this code will run on each CPU,
>>> even if only required on the primary CPU?
>>>
>>
>> Running once on all CPUs is harmless - we are not in a hurry.
>>
>> Running without stack is a problem, see below.
> 
> Depends. It's perfectly fine to have no stack if we don't use it (even
> if we had an corrupt stack…).
> 
>>
>>>>> + */
>>>>> +void __attribute__((naked, noreturn, section(".boot")))
>>>>
>>>> What's "naked" supposed to do here? The compilers says it will ignore
>>>> it.
>>>
>>> Huh? Does it produce a warning?
>>
>> Yes, gcc 7.4 said so.
>>
>>>
>>> naked ensures that there's no pro- and epilogue sequences. And as the
>>> comment states: We won't have a stack that early.
>>>
>>> BTW: That raises a question: Where do we set the stack for secondary
>>> CPUs on x86?
>>
>> In header.S, always to stack_top... Looks like the ARM issue is still
>> present here.
> 
> I remember that I once fixed some corrupted stack things for ARM
> inmates, but it was a bit different there.
> 
> Trying to interpret your words: We're currently running on an
> overlapping stack for secondary (all) CPUs, right? Should probably be
> fixed first.
> 
> Hmm. On x86 we're hlting all secondary CPUs very early and wait for a
> SIPI. Besides setting ap_entry, we could allocate a valid stack on the
> primary CPU inside smp_start_cpu() and pass it.

MMIO stuff is done, let's get back to this topic. :-)

The problem that I initially wanted to solve was missing SSE support,
turns out secondary CPUs in x86 have a broken stack.

There might be a clean solution to solve all issues, including the dirty
"naked jmp around"-hack:

Let's reconsider the early boot phase of x86 inmates. In comparison to
ARM, for example, we start all CPUs but halt secondary CPUs and wait for
SIPIs.

Now, we could provide a stack symbol that points to a free, valid stack
that a CPU can grab. Initially, (by compile time) the symbol holds the
address of the stack for the primary CPU. It's yet unclear who will win
the race to be the primary CPU.

We could cmpxchg the content of this symbol *very* early. Whoever wins
is the primary CPU, everyone else will read zero and simply put the CPU
in hlt state.

When starting secondary CPUs, smp_start_cpu() alloc()s a new stack from
the heap, sets the stack symbol (and sets ap_entry) and kicks the CPU.
Now the same startup code will run, but the secondary CPU grabs the
stack and boots.

This has some advantages:
  - We have a valid stack very early, and can jump to C back and forth
    with regular calls instead of the hacky hack above.
  - Startup code will only run once when needed, and not multiple times
    as it is the case at the moment
  - Every secondary CPU gets its own clean stack (which is broken atm)
  - Still we maintain shared startup code for primary and secondary
    CPUs.

Does that sound reasonable? I'd just like to hear some feedback before I
start hacking, as this could become a bit more complex, yet another
time. :-)

  Ralf


> 
>>
>>>
>>>>
>>>>> +arch_init_features(void)
>>>>> +{
>>>>> +    register u64 features;
>>>>> +
>>>>> +    features = cpuid_edx(X86_CPUID_FEATURES, 0);
>>>>> +    /* Check availability of FPU */
>>>>> +    x86_cpu_features.fpu = !!(features & X86_FEATURE_FPU);
>>>>> +
>>>>> +    /* Discover and enable FXSR */
>>>>> +    if (features & X86_FEATURE_FXSR) {
>>>>> +        write_cr4(read_cr4() | X86_CR4_OSFXSR);
>>>>> +        x86_cpu_features.fxsr = true;
>>>>> +    }
>>>
>>> BTW: Another point that I realised in the meanwhile: AFAICT, the
>>> hypervisor will very likely hand over with OSFXSR enabled as it is set
>>> by the cr4_required1 mask.
>>
>> The hypervisor hands over in reset state, and there these features are off.
> 
> Ahm. No. On a real AMD machine that's true, yes.
> 
> But it behaves differently on Qemu x86: CR4 is handed over with 0x2020,
> which means OSFXSR | PVI. And even if I try to deactivate, it won't
> work. Look at this inmate code:
> 
>        printk("CR4 handover: 0x%lx\n", read_cr4());
>        write_cr4(read_cr4() & ~0x2000);
>        printk("CR4 now: 0x%lx\n", read_cr4());
> 
> will output:
> CR4 handover: 0x2020
> CR4 now: 0x2020
> 
> CR4 is write-intercepted, and vmx_set_guest_cr in vmx.c:443 will set it
> again on interception.
> 
>>
>>>
>>> Nevertheless, it should be properly set.
>>>
>>>>> +
>>>>> +    /* Check availability of SSE */
>>>>> +    x86_cpu_features.sse = !!(features & X86_FEATURE_SSE);
>>>>> +    x86_cpu_features.sse2 = !!(features & X86_FEATURE_SSE2);
>>>>> +
>>>>> +    /* ECX hides the rest */
>>>>> +    features = cpuid_ecx(X86_CPUID_FEATURES, 0);
>>>>> +    x86_cpu_features.sse3 = !!(features & X86_FEATURE_SSE3);
>>>>> +    x86_cpu_features.sse4_1 = !!(features & X86_FEATURE_SSE4_1);
>>>>> +    x86_cpu_features.sse4_2 = !!(features & X86_FEATURE_SSE4_2);
>>>>> +    x86_cpu_features.pclmulqdq = !!(features & X86_FEATURE_PCLMULQDQ);
>>>>> +
>>>>> +    if (features & X86_FEATURE_XSAVE) {
>>>>> +        /* Enable XSAVE related instructions */
>>>>> +        write_cr4(read_cr4() | X86_CR4_OSXSAVE);
>>>>> +        x86_cpu_features.xsave = true;
>>>>> +
>>>>> +        /*
>>>>> +         * Intel SDM 13.2: A bit can be set in XCR0 if and only if the
>>>>> +         * corresponding bit is set in this bitmap.  Every processor
>>>>> +         * that supports the XSAVE feature set will set EAX[0] (x87
>>>>> +         * state) and EAX[1] (SSE state).
>>>>> +         *
>>>>> +         * We can always safely write SSE + FP, but only set AVX if
>>>>> +         * available.
>>>>> +         */
>>>>> +
>>>>> +        features = cpuid_edax(X86_CPUID_XSTATE, 0);
>>>>> +        write_xcr0(read_xcr0() | (features & X86_XCR0_AVX) | \
>>>>> +               X86_XCR0_SSE | X86_XCR0_X87);
>>>>> +        x86_cpu_features.avx = !!(features & X86_XCR0_AVX);
>>>>> +    }
>>>>> +
>>>>> +    /* hand control back to assembler */
>>>>> +    asm volatile("jmp x86_start\t\n");
>>>>
>>>> Hacky... Is there no stack yet to do proper call/ret?
>>>
>>> See above, no stack. But I agree on hacky. :-)
>>>
>>>>
>>>> But if you want to / have to use a jmp here, you need to tell the
>>>> compiler that
>>>> this will never "return" (__builtin_unreachable).
>>>
>>> What's the difference between noreturn and builtin_unreachable?
>>
>> The former declares it for the function and, thus, also informs its
>> callers, the latter marks that some line is never reached. That is
>> needed when you call something that is noreturn but does not declare it
>> (like your inline assembly).
>>
>>>
>>> The noreturn attribute makes sure that - besides the epilogue, we won't
>>> have a ret.
>>
>> The compiler should have barked at you that your noreturn function seems
>> to return (the compiler has no idea about the semantic of the inline
>> assembly).
> 
> I used gcc 8.3.0. No complaints. Anyway, I get your point.
> 
>   Ralf
> 
>>
>> Jan
>>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/eb711300-b19c-053e-21c7-721bab566b07%40oth-regensburg.de.
For more options, visit https://groups.google.com/d/optout.
Re: [PATCH v2 4/6] inmates: x86: discover and activate SSE/AVX

Reply via email to