Ruben van Royen wrote:
First of all, I was not yet talking about vectorizing your code which is often hard, especially for a compiler. but SSE can be used on scalars as well (as you probably know).
The fact is that the intel pentium 4 optimization guide says that SSE code is generally as fast as or faster than regular FP code. And especially the truncation to integer is faster. Also denormals (which started all of this) can be handled faster by sse math by turning on a mode flag that makes input denormals behave as zero's This is of course not IEEE compliant, but exactly what you were doing in your code.
I agree, in theory slowdowns should not occur but what I found strange is that even Intel's own compiler, icc produced bad performance
when compiling the resampling code with SSE/SSE2 math and vectorization on.
If the compiler was smart then it would not have used SSE/SSE2 in that section of code but apparently icc is still not good in spotting
those problems.
The problem for a C programmer is that since he is assuming that the compiler does a good job in optimizing, most will not easily be able
to figure out why the SSE optimizations slowed down certain routines.
Then there the dilemma might occur where 50% of CPU is spent in function1() and 50% in function2().
but if you activate SSE then function1() speeds up 40% while function2() slows down 30%.
If it was possible to tell the compiler to not use SSE in function2() then the app would benefit from SSE but
in the above case it would not.
Usually optimal C code can only be generated if the programmer knows the CPU well and the compiler too, but often
this requires long painful trial and error sessions, analysis of asm code generated by the compiler etc.
Ok there are profilers available but they don't automagically solve all the optimization problems.
cheers, Benno http://www.linuxsampler.org
The reasons for SSE code being slower than FP code could be: The addition is pipelined in the FP, but not in the SSE unit. Incorrect allignment might incur a higher penalty for SSE.
Ruben
