Many, perhaps most, CPUs have a float type that is slower than double,
because their internal hardware is double-only and they convert float to
double and back to float on every operation. Don't change the main source
to float types. Use macros, typedefs, or compiler switches. Also, David
will probably not be happy of the code is made less readable.
On Mon, Jun 20, 2016, 18:05 Maxime Guyon <[email protected]> wrote:
> Hello,
>
> Good news, after some work it seem I've got a working solution.
>
> @Steve, note that the CMSIS do not provide function for *atan2()* and
> *floor()...*
>
> First I've tested to do only single precision operation (do not allow
> double operation because M4 cannot handle double precision operation on the
> hardware FPU).
> In Codec2 it seem that a lot of operation involve double which seem not
> always necessary, for example all the define M_PI and other value which are
> not defined whith the suffix 'f' are considered by C ANSI code to be double.
> So operation with those define are double operation not optimized.
> The same occur each time you do an operation with a litteral without the
> 'f' suffix. For example (0.5*x or 0.5+x is a double operation which I
> change to "0.5f*x" and "0.5f+x").
>
> After fixing this in the Codec, I came to a speedup of at least 10% for
> the decoding!!
> I cannot say if this is a good hint for speed up and if you can live with
> the loss of precision but if yes, maybe this fix can be done in your main
> repository
>
> After that I tested some other compile option and optimization (O2 and
> some other inlining) without success.
>
> Finally I tested to pass the floating point mode of the target from
> "strict" to "relaxed".
> See the definition in wiki:
>
> Relaxed mode prioritizes speed over strict correctness. In relaxed mode,
>> the compiler may perform speed optimizations at the expense of reducing the
>> precision of some calculations, typically a tiny amount. For instance,
>> (X/3) is not precisely equivalent to (X*(1.0/3)), but in relaxed mode, the
>> compiler is allowed to make this transformation anyway, as multiplication
>> is much faster than division.
>
>
> Changing that provide me a speed up of about 45%!!!!
> Here are the encoding time after all fix:
>
> -Encoding time *without *modification was between *25ms *and* 42ms / *After
> modification it is between *: ** 18ms *and *19ms *so a speed up of about
> 55%. My processor will be loaded at *48%* for encoding sound at 8000Khz.
> -Decoding time *without* modification was between *39ms *and *56ms*
> * / *After modification it is between *: ** 23ms *and *27ms *so a speed
> up of about 52%. My processor will be loaded at *68%* for decoding sound
> at 8000Khz.
>
> I've played back the encoded stream at 1200bps and 1300bps and everything
> seem okay: I cannot hear any strong difference between the encoded version
> with modification and without my modification.
> Hope that this will help some other people to get it working on their
> target.
>
> Regards,
>
> Max
>
> 2016-06-18 9:54 GMT+02:00 glen english <[email protected]>:
>
>> RRR
>> I usually find O3 fractionally faster but alot of things break that I
>> dont expect (bad programming habits?). they don't break in O2. some
>> unexpected assumptions are made...
>>
>>
>> On 18/06/2016 1:25 PM, Steve wrote:
>> > Another algorithm that seems to suck a lot of CPU is
>> > phase_synth_zero_order() in decoding, and really the only thing in
>> > there is atan2() and floor(). (you've already changed the sin/cos). So
>> > maybe the CMSIS has a better version for those two?
>> >
>> > I know floor() is really a slow algorithm in gcc.
>> >
>> > http://stackoverflow.com/questions/824118/why-is-floor-so-slow
>> >
>> >
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports. http://sdm.link/zohomanageengine
>> _______________________________________________
>> Freetel-codec2 mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/freetel-codec2
>>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports. http://sdm.link/zohomanageengine
> _______________________________________________
> Freetel-codec2 mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/freetel-codec2
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2