Personally it seems pretty fast only like a max of 9 us for most ops on floats (doubles are a different story.) It can be as low as almost 1 us or as high as 55 us depending on operation.
> On Feb 6, 2020, at 12:45 AM, DANA MYERS <[email protected]> wrote: > > > >>> On February 5, 2020 at 9:25 PM Bruce Perens via Freetel-codec2 >>> <[email protected]> wrote: >>> >>> Dana, the only thing you didn't make clear is whether your code is using >>> the fixed or floating data type. If it's using the floating one, it would >>> be interesting to isolate why performance is so poor when more conventional >>> code is generated by the compiler. I can understand float code being >>> slightly slower than double, if the hardware FPU is implemented in double >>> size, as it normally would be. >> Yes, I am using floating types on both the Cortex-M4F and ESP32. My >> apologies for calling-out M4F without mentioning the significance of the >> 'F' :-). >> >> MCU FPUs are, in my limited experience (Cortex-M4F and ESP32), >> single-precision. IIRC, higher-end parts (Cortex-M7) may feature >> double-precision. >> >> I don't know why the optimized assembly is 2x faster than compiled code; >> that would be a question for Espressif/Tensilica, I suppose. >> The floating performance as previously benchmarked is poor enough that I >> wondered whether there was really hardware, or whether some of that blobby >> code was processing float in an exception handler. > As did I. So I gave it a try. > > Cheers, > Dana > >> On Wed, Feb 5, 2020, 8:16 PM Dana Myers < [email protected]> wrote: >> On 2/5/2020 4:25 PM, Bruce Perens via Freetel-codec2 wrote: >> > Bill, before you go any farther oh, you should make a floating point >> > benchmark. I don't believe the necessary performance is there. >> >> I used to think that, but then Espressif released their ESP-DSP library. I >> ported my >> Bell 202 modem from Cortex-M4F using CMSIS-DSP to ESP32 running at 240MHz >> using ESP-DSP and see comparable performance per clock, and my actual modem >> is single-threaded and thus uses only one of the ESP32's two cores. It's >> conceivable >> if some additional latency is tolerable and the algorithm divisible, it >> could be split >> over the two cores. >> >> Espressif offers both "ANSI C" portable functions and ESP32-specific >> assembly >> functions - the latter are considerably faster and what I am using. >> >> Cheers, >> Dana K6JQ >> >> >> >> _______________________________________________ >> Freetel-codec2 mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/freetel-codec2 >> _______________________________________________ >> Freetel-codec2 mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/freetel-codec2 > > > _______________________________________________ > Freetel-codec2 mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/freetel-codec2
_______________________________________________ Freetel-codec2 mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/freetel-codec2
