On Fri, 2004-07-02 at 00:40, Erik de Castro Lopo wrote: > > Eric what do you think ? can something like that be coded efficiently > > using SSE/SSE2 ? > > Probably not. There are some algorithms which simply can't be vectorized.
SSE2 is usually significantly faster for non-vectorized code also. At least for P4 and AMD64. I usually do some profiling on code generated by the compiler and then handcode the SSE2 parts for compiler bottlenecks. IIR filter was one good example where compilers sucked badly. -- Jussi Laako <[EMAIL PROTECTED]>
