----- Original Message ----- > On Tue, Jul 23, 2013 at 8:57 PM, Roland Scheidegger <[email protected]> > wrote: > > Am 23.07.2013 19:08, schrieb Andre Heider: > >> For AVX it's not sufficient to only rely on the cpuid flags. If the CPU > >> supports these extensions, but the OS doesn't, issuing these insns will > >> trigger an undefined opcode exception. > >> > >> In addition to the AVX cpuid bit we also need to: > >> * test cpuid for OSXSAVE support > >> * XGETBV to check if the OS saves/restores AVX regs on context switches > >> > >> See "Detecting Availability and Support" at > >> http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions > >> > >> Signed-off-by: Andre Heider <[email protected]> > >> --- > >> src/gallium/auxiliary/util/u_cpu_detect.c | 27 > >> +++++++++++++++++++++++++-- > >> 1 file changed, 25 insertions(+), 2 deletions(-) > >> > >> diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c > >> b/src/gallium/auxiliary/util/u_cpu_detect.c > >> index b118fc8..588fc7c 100644 > >> --- a/src/gallium/auxiliary/util/u_cpu_detect.c > >> +++ b/src/gallium/auxiliary/util/u_cpu_detect.c > >> @@ -67,7 +67,7 @@ > >> > >> #if defined(PIPE_OS_WINDOWS) > >> #include <windows.h> > >> -#if defined(MSVC) > >> +#if defined(PIPE_CC_MSVC) > >> #include <intrin.h> > >> #endif > >> #endif > >> @@ -211,6 +211,27 @@ cpuid(uint32_t ax, uint32_t *p) > >> p[3] = 0; > >> #endif > >> } > >> + > >> +static INLINE uint64_t xgetbv(void) > >> +{ > >> +#if defined(PIPE_CC_GCC) > >> + uint32_t eax, edx; > >> + > >> + __asm __volatile ( > >> + ".byte 0x0f, 0x01, 0xd0" // xgetbv isn't supported on gcc < 4.4 > >> + : "=a"(eax), > >> + "=d"(edx) > >> + : "c"(0) > >> + ); > >> + > >> + return ((uint64_t)edx << 32) | eax; > >> +#elif defined(PIPE_CC_MSVC) && defined(_MSC_FULL_VER) && > >> defined(_XCR_XFEATURE_ENABLED_MASK) > >> + return _xgetbv(_XCR_XFEATURE_ENABLED_MASK); > >> +#else > >> + return 0; > >> +#endif > >> + > >> +} > >> #endif /* X86 or X86_64 */ > >> > >> void > >> @@ -284,7 +305,9 @@ util_cpu_detect(void) > >> util_cpu_caps.has_sse4_1 = (regs2[2] >> 19) & 1; > >> util_cpu_caps.has_sse4_2 = (regs2[2] >> 20) & 1; > >> util_cpu_caps.has_popcnt = (regs2[2] >> 23) & 1; > >> - util_cpu_caps.has_avx = (regs2[2] >> 28) & 1; > >> + util_cpu_caps.has_avx = ((regs2[2] >> 28) & 1) && // AVX > >> + ((regs2[2] >> 27) & 1) && // OSXSAVE > >> + ((xgetbv() & 6) == 6); // XMM & > >> YMM > >> util_cpu_caps.has_f16c = (regs2[2] >> 29) & 1; > >> util_cpu_caps.has_mmx2 = util_cpu_caps.has_sse; /* SSE cpus > >> supports mmxext too */ > >> > >> > > > > Looks good to me though
Looks good to me too. Thanks. > > it's a pity detection depends on compiler. > > Granted it looks like icc currently won't work but still... > > I guess that technically the test for sse(x) isn't correct neither as > > that too requires OS support, I don't know off-hand though how to check > > for it (and we'd be talking ANCIENT os here...). > > Ancient indeed ;) > > But with AVX the problem becomes more urgent: All SSE versions used > the same registers, AVX extended those. > Now we recently got a AVX enabled vSphere server, and exposing that to > XP guests doesn't go too well with llvmpipe without this patch. I don't know of many llvmpipe windows users, specially XP. If it's not confidential, how are you using it? BTW, multithreaded performance would be much better if finished cleaning up http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=c11-threads , as the current conditional var implementation sucks big time. Jose _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
