Max

Can you post me a small slab of the floating point assembly (and associated C code ?)

something with muls, divs and adds etc...

I'd like to see what FP instructions it is using.

I only ever compile for optimization=Level2. I find Level 3 and 4 problematic.
what compiler ?

What is the exact part number of the processor you are using ? Might be a difference in instruction accelerator/ flash access / instruction cache setup/ configuration. (and speedup might be possible with reorganisation).I see about a 50% variation with the various FLASH accelerators / wait state combinations turned on and off on the M4.

David- can you provide yoru compiler compile and link flags?

cheers


glen


On 17/06/2016 11:22 PM, Maxime Guyon wrote:
I've finished my test for profile kissFFT and the CMSIS equivalent,

Speed of KissFFT function "kiss_fft" is compared to the CMSIS function "arm_cfft_radix2_f32"

For the test, I used the same file used by David with 1024Sample of input.
I noticed that the time to process is a little dependant of the data amplitude, less than the data frequency.
Execution time for transform the 1024 sample are:

With random data on it:
   - KissFFT: 1.2282ms
   - CMSIS:   1.2168ms -> about 1% speedup

With only 0 in sample:
   - KissFFT: 1.2366ms
   - CMSIS:   1.2094ms -> about 1.2% speedup

With sinusoid of amplitude 0.5 at 2500Hz (Sampling at 8kHz):
   - KissFFT: 1.2281ms
   - CMSIS:   1.2167ms -> about 1% speedup

With sinusoid of amplitude 1 at 2500Hz (Sampling at 8kHz):
   - KissFFT: 1.2076ms
   - CMSIS:   1.2089ms -> about  -0.01% slowdown

With sinusoid of amplitude 2 at 2500Hz (Sampling at 8kHz):
   - KissFFT: 1.2367ms
   - CMSIS:   1.2095ms -> about 2.2% speedup

With sinusoid of amplitude 3 at 2500Hz (Sampling at 8kHz):
   - KissFFT: 1.2281ms
   - CMSIS:   1.2167ms -> about  1% speedup

To conclude, CMSIS is very little faster than kissFFT in my test (contrary to the test result made by David which found CMSIS more slower)
But the gain is very poor and will not be sufficient for my target to do decoding in real time...

I still cannot understand why a 120MHz Cortex-M4 from TI is not capable to to the same job of your Cortex-M4 @168Mhz from ST which you say is loaded only at 50% 

To finish, it would be nice to measure the encoding and decoding time with your target and the same file "c2demo.c" and report result.
Just to be sure that my target should be capable of encode and decode real time...
Maybe your target is loaded at more that 50%?

Regards,

Max.





2016-06-17 13:27 GMT+02:00 glen english <[email protected]>:
Hi Maxime

nice work. Yes I would expect the FFT twiddle factors to be pre calculated.

I expect a fair bit of work would be required to use the CMSIS functions.

I use them extensively in my DSP and audio processing work, and the filter primitives are nice and fast.
They are certainly worthwhile using.

David is probably right- alot of the codec2 code is just general C code.

I bet though some work and performance would improve (to suit the STM32), there are always improvements on the edges.

.. or use a bigger processor...

cheers
glen




On 17/06/2016 9:20 PM, Maxime Guyon wrote:
Hello,

Some news:

I've compiled the CMSIS library for my target.
I've evaluated the performance of "sinf", "cosf" functions and their equivalent "arm_cos_f32" and "arm_sin_f32":

CMSIS functions are 15% more faster than the classic one.

I've replaced all call to "sinf" and "cosf" by "arm_cos_f32" and "arm_sin_f32" in codec2.
No remarkable difference, it's due to the fact that it seem kissFFT precalculate table for cosinus and sinus function at creation or codec.

I've found the test you did with CMSIS for FFT in STM32  directory "fft_test.c".
I'll try to get it working for my target and return back my profile result.

Regards

2016-06-17 10:04 GMT+02:00 Maxime Guyon <[email protected]>:
Hello,

@Steve: Thank you for the hint to define NDEBUG, I tested it but timing remain approximately the same.

@glen english: Thank you for your hint about CMSIS, I will try to compile CMSIS for my platform but seem to be not straightforward since I've never used it before. 

@David Rowe: I saw some other post where you effectively say that you have already tested CMSIS FFT without speed improving...
-Do you already have the piece of code for replace kissfft with CMSIS function?
-Can you provide some clue to test it and some step to replace kissfft?
-I saw a post but cannot find it now where you say that there is a file for test FFT, maybe it's the good starting point for test CMSIS speed improvement.

Regards,

Max

2016-06-17 2:34 GMT+02:00 Steve <[email protected]>:
Just a wild guess, but maybe define NDEBUG (-DNDEBUG) in the make call which will get rid of all the assert() check macro's. Making it a release rather than a debug version.

Maybe save a few ms. I think the FFT is the real cruncher though.


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2





------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine


_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2




------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine


_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2




------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to