My peephole optimisations mostly save only a handful of cycles each time which probably won't add up to much for a relatively short test.  The most major optimisation I can think of, although I'm not quite sure when it was merged, is the method of replacing divisions by a constant with an equivalent reciprocal multiplication.  You'll see the biggest savings there.  There's other difficulties like processors being intelligent with caching and out of order execution, for example, that are disguising some inefficiencies.  And some seek only to reduce code size with no loss of speed.

What are your timings like when compiling with COREAVX or COREAVX2?  A couple of recent peephole optimizations make use of BMI1 and BMI2.

I can't remember the proverb that Florian used, but it essentially boils down to very small changes, individually not amounting to much, but which accumulate into a noticable difference when in large numbers.

Kit

On 01/03/2023 10:32, Martin Frb via fpc-devel wrote:
So for a while now fpc 3.3.1 receives new optimizations => which is great / big fan of it.

And hence I thought, lets see how much of an impact they have. And in my test, they had none :(
Wondering if any one else has measured them?

My tests:
Win-10 64 bit
3.3.1  905c485ff413cd48f98891e2075c814759d0c6f1
3.2.3  2022-02-04
both compilers with each O2 and O4

Using the testcase for FpDebug (which runs a decent spread of code).
Testcase with O2 and O3

And I got no noticeable difference.
I also tried {$CodeAlign proc=32 loop=32} for O2 (test and fpc), also no diff.


O2 / fpc: o2 323
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.406
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.063
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.609
O2 / fpc: o2 331
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.251
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.031
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  21.531


O3 / fpc: o2 323
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.687
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.281
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.469
O3 / fpc: o2 331
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  23.203
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.250
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.140


O3 / fpc: o4 323
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  23.063
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.250
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.875
O3 / fpc: o4 331
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.577
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.094
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.235


{$CodeAlign proc=32 loop=32}
O2 / fpc: def 323
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.453
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.328
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.656
O2 / fpc: def 331
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.079
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.234
TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  21.984

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to