Hello Maxime,

Good work on you optimisation!  I would be interested to see if similar 
changes produce the same performance gains on the STM32F4 platform.

Would anyone like to try those changes, run some benchmarks, and post 
the results?

When I performed optimisation work in the past, I wrote code to compare 
the output waveform to the original vanilla C.  For example making sure 
the output samples are withing 1 part in 1000 of each other.  Very 
important to make sure the algorithm is not affected, and tricky to do 
with speech signals.

If anyone is interested in this work please contact me.

Thanks,

David

On 20/06/16 18:35, Maxime Guyon wrote:
> Hello,
>
> Good news, after some work it seem I've got a working solution.
>
> @Steve, note that the CMSIS do not provide function for *atan2()* and
> *floor()...*
>
> First I've tested to do only single precision operation (do not allow
> double operation because M4 cannot handle double precision operation on
> the hardware FPU).
> In Codec2 it seem that a lot of operation involve double which seem not
> always necessary, for example all the define M_PI and other value which
> are not defined whith the suffix 'f' are considered by C ANSI code to be
> double.
> So operation with those define are double operation not optimized.
> The same occur each time you do an operation with a litteral without the
> 'f' suffix. For example (0.5*x or 0.5+x is a double operation which I
> change to "0.5f*x" and "0.5f+x").
>
> After fixing this in the Codec, I came to a speedup of at least 10% for
> the decoding!!
> I cannot say if this is a good hint for speed up and if you can live
> with the loss of precision but if yes, maybe this fix can be done in
> your main repository
>
> After that I tested some other compile option and optimization (O2 and
> some other inlining) without success.
>
> Finally I tested to pass the floating point mode of the target from
> "strict" to "relaxed".
> See the definition in wiki:
>
>     Relaxed mode prioritizes speed over strict correctness. In relaxed
>     mode, the compiler may perform speed optimizations at the expense of
>     reducing the precision of some calculations, typically a tiny
>     amount. For instance, (X/3) is not precisely equivalent to
>     (X*(1.0/3)), but in relaxed mode, the compiler is allowed to make
>     this transformation anyway, as multiplication is much faster than
>     division.
>
>
> Changing that provide me a speed up of about 45%!!!!
> Here are the encoding time after all fix:
>
>     -Encoding time *without *modification was between *25ms *and* 42ms /
> *After modification it is between *: ** 18ms *and *19ms *so a speed up
> of about 55%. My processor will be loaded at *48%* for encoding sound at
> 8000Khz.
>     -Decoding time *without* modification was between *39ms *and
> *56ms** / *After modification it is between *: ** 23ms *and *27ms *so a
> speed up of about 52%. My processor will be loaded at *68%* for decoding
> sound at 8000Khz.
>
> I've played back the encoded stream at 1200bps and 1300bps and
> everything seem okay: I cannot hear any strong difference between the
> encoded version with modification and without my modification.
> Hope that this will help some other people to get it working on their
> target.
>
> Regards,
>
> Max
>
> 2016-06-18 9:54 GMT+02:00 glen english <[email protected]
> <mailto:[email protected]>>:
>
>     RRR
>     I usually find O3 fractionally faster but alot of things break that I
>     dont expect (bad programming habits?). they don't break in O2. some
>     unexpected assumptions are made...
>
>
>     On 18/06/2016 1:25 PM, Steve wrote:
>      > Another algorithm that seems to suck a lot of CPU is
>      > phase_synth_zero_order() in decoding, and really the only thing in
>      > there is atan2() and floor(). (you've already changed the
>     sin/cos). So
>      > maybe the CMSIS has a better version for those two?
>      >
>      > I know floor() is really a slow algorithm in gcc.
>      >
>      > http://stackoverflow.com/questions/824118/why-is-floor-so-slow
>      >
>      >
>
>
>
>     
> ------------------------------------------------------------------------------
>     What NetFlow Analyzer can do for you? Monitors network bandwidth and
>     traffic
>     patterns at an interface-level. Reveals which users, apps, and
>     protocols are
>     consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>     J-Flow, sFlow and other flows. Make informed decisions using
>     capacity planning
>     reports. http://sdm.link/zohomanageengine
>     _______________________________________________
>     Freetel-codec2 mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://lists.sourceforge.net/lists/listinfo/freetel-codec2
>
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity planning
> reports. http://sdm.link/zohomanageengine
>
>
>
> _______________________________________________
> Freetel-codec2 mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/freetel-codec2
>

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to