Michele Bavaro wrote:
Hello Philip,

thank you for coming back on this subject.
I modified the library developed by Gregory Heckler, the source code is here:

http://github.com/gps-sdr/gps-sdr/tree/6153c01317f34a26b2fb41926505b9d97f764e90/objects

To give you an example, the DIT butterfly looks like this:

So basically, you need to calculate (where a, b, c, w are complex)

c[0] = a[0] + b[1] * w
c[1] = a[1] + b[0] * w

If this is correct, I'll try and come up with a NEON way exploiting the SIMD nature of NEON.

Philip





#define BUTTERFLY_FWD(_A, _B, _W)                                       \
  __asm__ ("LDR    r0, [%0]            \n\t"                          \
           "LDR    r2, [%1]            \n\t"                          \
           "MOV    r3, #0              \n\t"                          \
           "SHADD16  r0, r0, r3        \n\t"                          \
           "SHADD16  r2, r2, r3        \n\t"                          \
           "LDR    r3, [%2]            \n\t"                          \
           "SMUADX r5, r2, r3          \n\t"                          \
           "SMUSD  r4, r2, r3          \n\t"                          \
           "ADD    r5, r5, #8192       \n\t"                          \
           "ADD    r4, r4, #8192       \n\t"                          \
           "ASR    r4, r4, #14         \n\t"                          \
           "PKHBT  r3, r4, r5, LSL #2  \n\t"                          \
           "QSUB16 r2, r0, r3          \n\t"                          \
           "QADD16 r0, r0, r3          \n\t"                          \
           "STR    r0, [%0]            \n\t"                          \
           "STR    r2, [%1]            \n\t"                          \
           ::"r" (_A), "r" (_B), "r" (_W)                         \
           :"r0", "r2", "r3", "r4", "r5", "memory")



and just uses ARM assembly (NEON is complicated to use with this basic
radix2 implementation).

As user space, I am using the Angstrom image v0.92:

http://www.gumstix.net/overo-gm-images/v0.92/

on my Overo Water. I use the CodeSourcery 2009q1 free toolchain, even
though today I've been suggested to try something else by Koen.


Regards,
Michele




Michele Bavaro wrote:
Hello everyone,

I'm porting my software GPS receiver on the OMAP, therefore I need fast
signal processing libraries, and in particular FFTs.

I have somehow adapted an open source library to do radix2 butterfly
using
ARM assembly. It works, but my 256 points fixed point 16 bit FFT still
takes about 60us. That's 12 times slower than 4.7us advertised with
NEON!
What open source FFT library? You could try posting the code and seeing
if anyone has any suggestions. (Post the code the Beagle list also,
there are some good NEON people there)

Frustrated, I downloaded and compiled with the evaluation version of
RVCT
the openMAX libraries, but I don't manage to link the object file with
code compiled with the CodeSourcery gnu toolchain.
What user space are you using? Angstrom or something else. You'll need
to use a tool chain that matches your user space.

Philip

I tried to translate the assembly, but unfortunately it's a very
challenging task for me.

Can someone point me in the right direction on this subject?
Should I keep working on my fixed point 16 bit FFT? Should I buy the ARM
toolchain and port all the software? Should I just give up and try using
the DSP maybe?

Thank you in advance for any reply, and good luck with the OpenSDR,
which
I'm watching very closely.

Cheers,
Michele






Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to