Dear List,

I'm new to this list and have some questions.
Looking at the created code of GCC on ARMv8, we noticed some areas where there 
is room for performance improvements.
I assume that these items might already be noticed by you guys.

For example:

1)      We noticed that when writing typical DGEMM like code, GCC includes 
unnecessary DUP instruction

2)      GCC seems unwilling to use LDP loads

3)      For optimal FPU performance on some A57 its needed to interleave 
instruction working on ODD and EVEN registers

GCC seem not properly support this. Here sometimes  100% performance increase 
could be reached by different instruction interleaving.

4)       Some work loops highly benefit of interleaving of FPU instructinons 
and loads.

GCC seems to likes to re-arrange the code so that most or all loads are put on 
top of the loop.
This can reduce the performance of a well written workloop significantly.


I have no patches to fix this.
But I can produce C- code and ASM output which will show these performance 
issues.

Please tell me what the next recommended step will be now.
Are all these items known already, or shall I provide code examples  to further 
explain them?


Kind regards
Gunnar von Boehn
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to