On Thu, 2008-02-07 at 18:51 +0100, Malte Steiner wrote: > Hello, > > I try to squeeze as much performance as possible out of my upcomming > Linux synthesizer and try manual vectorization with following construct > in c, mainly to vectorize away multiplications :
Have you checked the SIMD code in Ardour? We have SIMD code for crucial DSP. The functions we use are: (pure ASM, defined in libs/ardour/sse_functions.s or _64bit.s) mix_buffers_with_gain(float *dst, float *src, long nframes, float gain); mix_buffers_no_gain (float *dst, float *src, long nframes); apply_gain_to_buffer (float *buf, long nframes, float gain); float compute_peak(float *buf, long nframes, float current); (xmmintrin, defined in libs/ardour/sse_functions_xmm.cc) find_peaks(float *buf, nframes_t nframes, float *min, float *max) When I wrote the code, I was unable to get better results from the gcc vectorizer. From what I've heard, it's supposed to be getting better. But at least until that, we are using the above code. Note especially the xmmintrin syntax. It's a brilliant way of doing pseudo-assembler. It gives you the power of direct XMM (SIMD) register access and direct SIMD calls. compute_peak() returns the largest absolute peak value in buf and current. (i,e. return max( max(abs(buf)), current) ). The function we have is multiple magnitudes faster than anything GCC can come up with from generic C code. This is partly because we are using 16-byte aligned buffers and mostly because we can cheat and not run a true ABS function, but a bit masking operation which works for audio data as there are no infinites or NaNs in it. All functions work with aligned and non-aligned data. With non-aligned data, they will run one sample at a time until they reach alignment and continue 4 buffers at a time. Sampo _______________________________________________ Linux-audio-dev mailing list [email protected] http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
