On Tue, 2 Feb 2016, Marcus MERIGHI wrote:
> [email protected] (Stefan Kempf), 2016.02.01 (Mon) 19:13 (CET):
> > Marcus MERIGHI wrote:
> > > [email protected] (Stefan Kempf), 2016.01.30 (Sat) 10:49 (CET):
> > > > We need to see how it looks like from within the kernel (and whether
> > > > the illegal instruction is really raised from within sendsig()). Can you
> > > > try the diff below?
> > >
> > > > You should get a kernel panic now instead of an illegal instruction
> > > > signal if you try running ping or top. We need the output of the panic
> > > > message and the output of the following commands:
> > >
> > > ping(1), top(1) messed up the screen.
> > >
> > > # ping 192.168.188.189
> > > PING 192.168.188.189 (192.168.188.189): 56 data bytes
> > > 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=166.533 ms
> > > panic: sendsig 1: fxsave 0xffff800032c8a000, sp 0x7f7fff0d20b1,
> > > fxave_size 512, savefpu_size 832, fpu_save_len 15773951, tf_rsp
> > > 0x7f7ffffdd238, userstack 1
> >
> > fpu_save_len is way too large (0xf0b0ff in hex). It should be 832 at
> > most. And that causes the kernel to attempt writes outside of the
> > process stack (and/or to read beyond the saved FPU state).
> >
> > Either the value we get from CPUID is strange (or we handle CPUID
> > wrongly), or something trashes fpu_save_len.
>
> Now that you mention CPUID...
> If I switch 'Max CPUID Value Limit' to 'disabled' in the BIOS, the
> symptom is gone. It re-appears when setting to 'enabled'.
"Doctor, it hurts when I do this..."
That BIOS option exists to support ancient OSes (Windows NT, etc) and
shouldn't be enabled when using OpenBSD.
Currently we seem to assume that the presence of certain CPU features like
AVX implies that CPUID supports the related leaf; that BIOS option breaks
that assumption, resulting in the bogus fpu_save_len sizing you hit.
>From the dmesg you posted I see it also explains the bogus mwait sizing
that has been reported by some others. Your machine will perform better
with that option off; I guess we should add check to the code to catch
this sort of setup by checking the cpuid_level variable before using the
higher CPUID leafs.
Can you try applying the diff below, temporarily re-enable that BIOS
option, then report the resulting dmesg and verify that ping works
properly?
Philip Guenther
Index: i386/i386/cpu.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/i386/i386/cpu.c,v
retrieving revision 1.70
diff -u -p -r1.70 cpu.c
--- i386/i386/cpu.c 27 Dec 2015 04:31:34 -0000 1.70
+++ i386/i386/cpu.c 2 Feb 2016 16:54:09 -0000
@@ -784,7 +784,7 @@ cpu_init_mwait(struct device *dv)
{
unsigned int smallest, largest, extensions, c_substates;
- if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
+ if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
return;
/* get the monitor granularity */
Index: amd64/amd64/cpu.c
===================================================================
RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.94
diff -u -p -r1.94 cpu.c
--- amd64/amd64/cpu.c 27 Dec 2015 04:31:34 -0000 1.94
+++ amd64/amd64/cpu.c 2 Feb 2016 16:54:30 -0000
@@ -282,7 +282,7 @@ cpu_init_mwait(struct cpu_softc *sc)
{
unsigned int smallest, largest, extensions, c_substates;
- if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0)
+ if ((cpu_ecxfeature & CPUIDECX_MWAIT) == 0 || cpuid_level < 0x5)
return;
/* get the monitor granularity */
@@ -505,7 +505,7 @@ cpu_init(struct cpu_info *ci)
cr4 |= CR4_OSXSAVE;
lcr4(cr4);
- if (cpu_ecxfeature & CPUIDECX_XSAVE) {
+ if (cpu_ecxfeature & CPUIDECX_XSAVE && cpuid_level >= 0xd) {
u_int32_t eax, ebx, ecx, edx;
xsave_mask = XCR0_X87 | XCR0_SSE;