Re: [Freetel-codec2] Encoding decoding time TIVA

glen english Mon, 20 Jun 2016 02:57:51 -0700

Max Nice work. Yes. indeed. all cases of float literals should have f after them .... I used to just force doubles to floats but I realised that was lazy programming so I think specific float f literal is important. In contrast to Bruce, I WOULD change the main source to float literals, UNLESS they MUST be doubles. Running on the PC hides alot of evils (or another way to look at it - buys you alot of flexibility) . Good on the divides, also, Divides suck. I've never looked at the code in detail. I'll get it up on the M7 soon. I think the code needs a hose through it to fix alot of this stuff that is not suited to microcontroller platforms. Yes yes I know David likes it truly X platform but IMO one should write with regard for smaller platforms if they are at all expected to be targets.. I guess that comes from me having a background of always being hard up for processor cycles... When I write code, aware that divides are hard , I always change them to multiplies if I possibly can and only do divides if I cannot possibly avoid it good work again! cheers

On 20/06/2016 7:18 PM, Bruce Perens wrote:

Many, perhaps most, CPUs have a float type that is slower than double, because their internal hardware is double-only and they convert float to double and back to float on every operation. Don't change the main source to float types. Use macros, typedefs, or compiler switches. Also, David will probably not be happy of the code is made less readable.

On Mon, Jun 20, 2016, 18:05 Maxime Guyon <[email protected]> wrote:

Hello,

Good news, after some work it seem I've got a working solution.

@Steve, note that the CMSIS do not provide function for atan2() and floor()...

First I've tested to do only single precision operation (do not allow double operation because M4 cannot handle double precision operation on the hardware FPU).

In Codec2 it seem that a lot of operation involve double which seem not always necessary, for example all the define M_PI and other value which are not defined whith the suffix 'f' are considered by C ANSI code to be double.
So operation with those define are double operation not optimized.
The same occur each time you do an operation with a litteral without the 'f' suffix. For example (0.5*x or 0.5+x is a double operation which I change to "0.5f*x" and "0.5f+x").

After fixing this in the Codec, I came to a speedup of at least 10% for the decoding!!
I cannot say if this is a good hint for speed up and if you can live with the loss of precision but if yes, maybe this fix can be done in your main repository

After that I tested some other compile option and optimization (O2 and some other inlining) without success.

Finally I tested to pass the floating point mode of the target from "strict" to "relaxed".
See the definition in wiki:

Relaxed mode prioritizes speed over strict correctness. In relaxed mode, the compiler may perform speed optimizations at the expense of reducing the precision of some calculations, typically a tiny amount. For instance, (X/3) is not precisely equivalent to (X*(1.0/3)), but in relaxed mode, the compiler is allowed to make this transformation anyway, as multiplication is much faster than division.

Changing that provide me a speed up of about 45%!!!!
Here are the encoding time after all fix:

-Encoding time without modification was between 25ms and 42ms / After modification it is between : 18ms and 19ms so a speed up of about 55%. My processor will be loaded at 48% for encoding sound at 8000Khz.
-Decoding time without modification was between 39ms and 56ms / After modification it is between : 23ms and 27ms so a speed up of about 52%. My processor will be loaded at 68% for decoding sound at 8000Khz.

I've played back the encoded stream at 1200bps and 1300bps and everything seem okay: I cannot hear any strong difference between the encoded version with modification and without my modification.
Hope that this will help some other people to get it working on their target.

Regards,

Max

2016-06-18 9:54 GMT+02:00 glen english <[email protected]>:

RRR
I usually find O3 fractionally faster but alot of things break that I
dont expect (bad programming habits?). they don't break in O2. some
unexpected assumptions are made...

On 18/06/2016 1:25 PM, Steve wrote:
> Another algorithm that seems to suck a lot of CPU is
> phase_synth_zero_order() in decoding, and really the only thing in
> there is atan2() and floor(). (you've already changed the sin/cos). So
> maybe the CMSIS has a better version for those two?
>
> I know floor() is really a slow algorithm in gcc.
>
> http://stackoverflow.com/questions/824118/why-is-floor-so-slow
>
>

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine

_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Re: [Freetel-codec2] Encoding decoding time TIVA

Reply via email to