Ian Lance Taylor <i...@google.com> wrote: > Your changes are relying on a gcc extension which was only recently > added, more recently than those tests were added to the testsuite. Only > recently did gcc acquire the ability to use [] to access elements in a > vector.
That isn't what my changes did. The array accesses are to the arrays in the union - nothing cutting edge there. The data is accessed through the array specified by .d (or .s etc.) not to name.x[index]. > So I think you may have misinterpreted the __builtin_ia32_vec_ext_v2di > builtin function. That function treats the vector as containing two > 8-byte integers, and pulls out one or the other depending on the second > argument. Your dumps of res[0] and res[1] suggest that you are treating > the vector as four 4-byte integers and pulling out specific ones. Yup, my bad, put in d where it should have been ll. Also fixed the problem I induced in sse2-check.h, where too large a chunk was commented out, that was causing the gcc -Wall -msse2 problem. The changed part in the original source was if ((edx & bit_SSE2) && sse_os_support ()) and is now: #if !defined(SOFT_SSE2) if ((edx & bit_SSE2) && sse_os_support ()) #else if (sse_os_support ()) #endif /*SOFT_SSE2*/ My software SSE2 passes all 165 of the sse2 tests that are complete programs. However, there is a problem in the real world. While the sse2 programs in the testsuite do exercise the _mm* functions, they do so one at a time. I have found that in real code, which makes multiple _mm* calls, if -O0 is not used, the wrong results (may) come out. % gcc -std=gnu99 -g -pg -pthread -O0 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g -pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel -I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest ./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall -Wl,--end-group -leasel -lm % ./msvfilter_utest (no output, it ran correctly) % gcc -std=gnu99 -g -pg -pthread -O1 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g -pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel -I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest ./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall -Wl,--end-group -leasel -lm % ./msvfilter_utest msv filter unit test failed: scores differ (-50.37, -10.86) Going to higher optimization and there are even bigger issues, like not compiling at all (even with gcc 4.4.1): % gcc -std=gnu99 -g -pg -pthread -O2 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g -pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel -I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest ./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall -Wl,--end-group -leasel -lm ../../easel/emmintrin.h:2178: warning: dereferencing pointer '({anonymous})' does break strict-aliasing rules ../../easel/emmintrin.h:2178: note: initialized from here . . (same sort of message many many times) . ./msvfilter.c:208: error: unable to find a register to spill in class 'GENERAL_REGS' ./msvfilter.c:208: error: this is the insn: (insn 1944 1943 1945 46 ../../easel/emmintrin.h:2348 (set (strict_low_part (subreg:HI (reg:TI 1239) 0)) (mem:HI (reg/f:SI 96 [ pretmp.1031 ]) [13 S2 A16])) 47 {*movstricthi_1} (nil)) ./msvfilter.c:208: confused by earlier errors, bailing out Would changing the use of inlined functions to defines let the compiler digest it better? For instance: static __inline __m128i __attribute__((__always_inline__)) _mm_andnot_si128 (__m128i __A, __m128i __B) { return (~__A) & __B; } becomes #define _mm_andnot_si128(A,B) (~A & B) That approach will get really messy for the more complicated _mm*. In general terms, can somebody give me a hint as to the sorts of things that if found in inlined functions might cause the compiler to optimize to invalid code? Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech