Hi Danilo

Yes, there is most-certainly a penalty for const access from flash, at 
least on the M4.

and of course instruction cache is no use, is just that, only for 
instructions

I wonder what the bus matrix penalty is for data fetches from flash. 
There is an app-note about it somewhere I once read. It depends on what 
else is going on. The processor is pretty smart and interleaving the 
accesses as not to stall the pipeline or bus matrix.

As Danilo I am sure you know : (pointed out for others)
You can force variables into sections (like forcing a static const ) by 
using

__attribute__((section("name"))) to assign say into data.

I will some time run up the code on an M4 and see what kiss-fft does.

I am very very very surprised , and do not really believe that kissFFT 
is as fast as the arm assembler  on the M4 -  my immediate thoughts are 
"you are doing it wrong". so, I will investigate.

g









On 18/09/2016 12:13 PM, Danilo Beuche wrote:
> Hi Glen,
>
> just checked, it seems to me that we have all the caches running in the
> mcHF:
>
> https://github.com/df8oe/mchf-github/blob/670f94a2e69a55a03f099ad25390925c84c09201/mchf-eclipse/cmsis_boot/system_stm32f4xx.c#L424
>
> So even with these caches/buffers enabled, the M4 looses performance  by
> data reads from flash. Haven't checked the manual/internet but maybe the
> flash caching works well for code but not in the same way for data.
>
>
> Danilo
>
> Am 18.09.2016 um 03:12 schrieb glen english:
>> Hi Danilo
>>
>> Good thoughts and points.
>>
>> while on the RAM subject : require reading for every serious programmer :
>> "Memory"
>> https://lwn.net/Archives/GuestIndex/#Drepper_Ulrich
>>
>> read all 7 parts, 100 pages, but if you only have an hour, just read
>> "part 2 - cache"
>>
>> On the fft:
>>
>> I am not surprised that the ARM lib  hand optimized assembler is that
>> much faster.
>> more that 2x faster.... in fact.
>>
>> I don't think kiss-fft is particular suitable for this sort of platform,
>> either, I'll hold back what I really think :-) .
>>
>> The 5WS on flash (actually 6WS I am running @ 168M) does not really
>> affect the performance too much.  In fact I can vary the WS count +/- 2
>> without much change- the ART and the prefetch and the instruction and
>> data caches are doing their job, so there is very little difference with
>> the const values in ram or cache.
>>
>> In fact, most  FFT implementations are very tough on a machine with cache .
>> Have you read the paper on how FFTW works ? It is very cache aware- and
>> adaptive to the architecture- that is why it does trial runs and picks
>> the best.
>>
>> The M7 is very impressive. It is certainly impressive work by ARM.
>>
>> However, the M4 is what all of you have to work with so we can stay
>> focussed on that.
>>
>> I think also the ram usage will be significantly less with the arm FFT
>> because of the re-entrant Kiss-fft behaviour.
>>
>> The m4 is quite a different beast, and no D-cache can improve
>> performance over the M7 for some (inaptly) written applications (not
>> this one- but as a generalization for applications grabbing a byte from
>> memory randomly and all over a large dataset)
>>
>> Large matrix operations are where cache machines fall over- that is once
>> the dataset is bigger than the cache....
>>
>> The question is how much optimization is enough. I am tempted NOT to
>> optimize any more, although I feel (just by looking at it )  I can get
>> another 2x out of it..... Why- well there is no real pressing need.
>> Going too far away from the reference code will island the code a bit.
>> However, if you run out of modem cycles/ modem ram, then we can probably
>> get a bit more...
>>
>> cheers
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Freetel-codec2 mailing list
>> Freetel-codec2@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/freetel-codec2
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Freetel-codec2 mailing list
> Freetel-codec2@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/freetel-codec2
>



------------------------------------------------------------------------------
_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to