My peephole optimisations mostly save only a handful of cycles each time
which probably won't add up to much for a relatively short test. The
most major optimisation I can think of, although I'm not quite sure when
it was merged, is the method of replacing divisions by a constant with
an equivalent reciprocal multiplication. You'll see the biggest savings
there. There's other difficulties like processors being intelligent
with caching and out of order execution, for example, that are
disguising some inefficiencies. And some seek only to reduce code size
with no loss of speed.
What are your timings like when compiling with COREAVX or COREAVX2? A
couple of recent peephole optimizations make use of BMI1 and BMI2.
I can't remember the proverb that Florian used, but it essentially boils
down to very small changes, individually not amounting to much, but
which accumulate into a noticable difference when in large numbers.
Kit
On 01/03/2023 10:32, Martin Frb via fpc-devel wrote:
So for a while now fpc 3.3.1 receives new optimizations => which is
great / big fan of it.
And hence I thought, lets see how much of an impact they have. And in
my test, they had none :(
Wondering if any one else has measured them?
My tests:
Win-10 64 bit
3.3.1 905c485ff413cd48f98891e2075c814759d0c6f1
3.2.3 2022-02-04
both compilers with each O2 and O4
Using the testcase for FpDebug (which runs a decent spread of code).
Testcase with O2 and O3
And I got no noticeable difference.
I also tried {$CodeAlign proc=32 loop=32} for O2 (test and fpc), also
no diff.
O2 / fpc: o2 323
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.406
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.063
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.609
O2 / fpc: o2 331
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.251
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.031
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 21.531
O3 / fpc: o2 323
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.687
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.281
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.469
O3 / fpc: o2 331
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 23.203
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.250
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.140
O3 / fpc: o4 323
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 23.063
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.250
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.875
O3 / fpc: o4 331
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.577
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.094
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.235
{$CodeAlign proc=32 loop=32}
O2 / fpc: def 323
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.453
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.328
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.656
O2 / fpc: def 331
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.079
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.234
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 21.984
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel