On Mon, Nov 22, 2010 at 02:33:32PM -0800, David Mathog wrote:
> My software implementation of SSE2 now passes all the testsuite
> programs. In case anybody else ever needs this, it is here:
>
> http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/soft_emmintrin.h
>
> I compiled that with a target program and gprof showed
> all the time in resulting binary in the inlined functions. It ran about
> 4X slower than the SSE2 hardware version, which is about what I
> expected. So, so far so good. What I am worried about now is that
> since it was invoked with "-msse2" the compiler may still be generating
> SSE2 calls within the inlined functions. Is there a way to definitively
> disable this but still retain -msse2 on the command line?
>
> For instance, here is one of the software version inline functions:
>
> /* vector subtract the two doubles in an __m128d */
> static __inline __m128d __attribute__((__always_inline__))
> _mm_sub_pd (__m128d __A, __m128d __B)
> {
> return (__m128d)((__v2df)__A - (__v2df)__B);
> }
Use target attributes (or pragmas):
static __inline __m128d __attribute__((__always_inline__,__target__("no-sse2")))
_mm_sub_pd (__m128d __A, __m128d __B)
{
return (__m128d)((__v2df)__A - (__v2df)__B);
}
or:
#pragma GCC push_options
#pragma GCC target ("no-sse2")
static __inline __m128d __attribute__((__always_inline__))
_mm_sub_pd (__m128d __A, __m128d __B)
{
return (__m128d)((__v2df)__A - (__v2df)__B);
}
#pragma GCC pop_options
> In the original gcc emmintrin.h that called a builtin _explicitly_. I
> also want to avoid having the compiler use the same builtin
> _implicitly_. If it uses SSE, 3DNOW or MMX implicitly, in this example,
> that would be fine, it just cannot use any SSE2 hardware.
>
> Actually, one thing I was never very clear on, do -msse2 -m3dnow
> etc. only provide access to the corresponding machine operations through
> the _mm* (or whatever) definitions in the header file, or does the
> compiler also figure out vector operations by itself during the
> optimization phase of compilation?
If -msse2 is used on the command line or inside of a target attribute/pragma,
the compiler feels free to use the sse2 instructions in any fashion, including
when vectorizing.
--
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
[email protected] fax +1 (978) 399-6899