Ian Lance Taylor <i...@google.com> wrote:

> Your changes are relying on a gcc extension which was only recently
> added, more recently than those tests were added to the testsuite.  Only
> recently did gcc acquire the ability to use [] to access elements in a
> vector. 

That isn't what my changes did. The array accesses are to the arrays in
the union - nothing cutting edge there.  The data is accessed through
the array specified by .d (or .s etc.) not to name.x[index].


> So I think you may have misinterpreted the __builtin_ia32_vec_ext_v2di
> builtin function.  That function treats the vector as containing two
> 8-byte integers, and pulls out one or the other depending on the second
> argument.  Your dumps of res[0] and res[1] suggest that you are treating
> the vector as four 4-byte integers and pulling out specific ones.

Yup, my bad, put in d where it should have been ll.  Also fixed the
problem I induced in sse2-check.h, where too large a chunk was commented
out, that was causing the gcc -Wall -msse2 problem.  The changed part in
the original source was

  if ((edx & bit_SSE2) && sse_os_support ())

and is now:

#if !defined(SOFT_SSE2)
  if ((edx & bit_SSE2) && sse_os_support ())
#else
  if (sse_os_support ())
#endif /*SOFT_SSE2*/

My software SSE2 passes all 165 of the sse2 tests that are complete
programs.

However, there is a problem in the real world.  While the sse2 programs
in the testsuite do exercise the _mm* functions, they do so one at a
time.  I have found that in real code, which makes multiple _mm* calls,
if -O0 is not used, the wrong results (may) come out.  

% gcc -std=gnu99 -g -pg -pthread -O0 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g
-pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel
-I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest
./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall
-Wl,--end-group -leasel -lm
% ./msvfilter_utest
(no output, it ran correctly)

% gcc -std=gnu99 -g -pg -pthread -O1 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g
-pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel
-I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest
./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall
-Wl,--end-group -leasel -lm
% ./msvfilter_utest
msv filter unit test failed: scores differ (-50.37, -10.86)

Going to higher optimization and there are even bigger issues, like not
compiling at all (even with gcc 4.4.1):

% gcc -std=gnu99 -g -pg -pthread -O2 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g
-pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel
-I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest
./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall
-Wl,--end-group -leasel -lm
../../easel/emmintrin.h:2178: warning: dereferencing pointer
'({anonymous})' does break strict-aliasing rules
../../easel/emmintrin.h:2178: note: initialized from here
.
.  (same sort of message many many times)
.
./msvfilter.c:208: error: unable to find a register to spill in class
'GENERAL_REGS'
./msvfilter.c:208: error: this is the insn:
(insn 1944 1943 1945 46 ../../easel/emmintrin.h:2348 (set
(strict_low_part (subreg:HI (reg:TI 1239) 0))
        (mem:HI (reg/f:SI 96 [ pretmp.1031 ]) [13 S2 A16])) 47
{*movstricthi_1} (nil))
./msvfilter.c:208: confused by earlier errors, bailing out

Would changing the use of inlined functions to defines let the compiler
digest it better?  For instance:

static __inline __m128i __attribute__((__always_inline__))
_mm_andnot_si128 (__m128i __A, __m128i __B)
{
  return (~__A) & __B;
}

becomes

#define _mm_andnot_si128(A,B)  (~A & B)

That approach will get really messy for the more complicated _mm*.

In general terms, can somebody give me a hint as to the sorts of things
that if found in inlined functions might cause the compiler to optimize
to invalid code?


Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Reply via email to