Michele Bavaro wrote:
Hello Philip,thank you for coming back on this subject. I modified the library developed by Gregory Heckler, the source code is here: http://github.com/gps-sdr/gps-sdr/tree/6153c01317f34a26b2fb41926505b9d97f764e90/objects To give you an example, the DIT butterfly looks like this:
So basically, you need to calculate (where a, b, c, w are complex) c[0] = a[0] + b[1] * w c[1] = a[1] + b[0] * wIf this is correct, I'll try and come up with a NEON way exploiting the SIMD nature of NEON.
Philip
#define BUTTERFLY_FWD(_A, _B, _W) \ __asm__ ("LDR r0, [%0] \n\t" \ "LDR r2, [%1] \n\t" \ "MOV r3, #0 \n\t" \ "SHADD16 r0, r0, r3 \n\t" \ "SHADD16 r2, r2, r3 \n\t" \ "LDR r3, [%2] \n\t" \ "SMUADX r5, r2, r3 \n\t" \ "SMUSD r4, r2, r3 \n\t" \ "ADD r5, r5, #8192 \n\t" \ "ADD r4, r4, #8192 \n\t" \ "ASR r4, r4, #14 \n\t" \ "PKHBT r3, r4, r5, LSL #2 \n\t" \ "QSUB16 r2, r0, r3 \n\t" \ "QADD16 r0, r0, r3 \n\t" \ "STR r0, [%0] \n\t" \ "STR r2, [%1] \n\t" \ ::"r" (_A), "r" (_B), "r" (_W) \ :"r0", "r2", "r3", "r4", "r5", "memory") and just uses ARM assembly (NEON is complicated to use with this basic radix2 implementation). As user space, I am using the Angstrom image v0.92: http://www.gumstix.net/overo-gm-images/v0.92/ on my Overo Water. I use the CodeSourcery 2009q1 free toolchain, even though today I've been suggested to try something else by Koen. Regards, MicheleMichele Bavaro wrote:Hello everyone, I'm porting my software GPS receiver on the OMAP, therefore I need fast signal processing libraries, and in particular FFTs. I have somehow adapted an open source library to do radix2 butterfly using ARM assembly. It works, but my 256 points fixed point 16 bit FFT still takes about 60us. That's 12 times slower than 4.7us advertised with NEON!What open source FFT library? You could try posting the code and seeing if anyone has any suggestions. (Post the code the Beagle list also, there are some good NEON people there)Frustrated, I downloaded and compiled with the evaluation version of RVCT the openMAX libraries, but I don't manage to link the object file with code compiled with the CodeSourcery gnu toolchain.What user space are you using? Angstrom or something else. You'll need to use a tool chain that matches your user space. PhilipI tried to translate the assembly, but unfortunately it's a very challenging task for me. Can someone point me in the right direction on this subject? Should I keep working on my fixed point 16 bit FFT? Should I buy the ARM toolchain and port all the software? Should I just give up and try using the DSP maybe? Thank you in advance for any reply, and good luck with the OpenSDR, which I'm watching very closely. Cheers, Michele
smime.p7s
Description: S/MIME Cryptographic Signature
