At the moment, I'm experimenting with overhauling the x86_64 optimizer to see if I can reduce the number of passes through a block of code - my hope is to greatly increase the speed of the compiler without sacrificing the optimisations performed under -O1 and -O2. At present, I've attempted to not modify i386 because I wish to use it as a control case (i.e. do my changes break other platforms?)
It's probably not worthy of the bounty, but I'm enjoying the challenge to seeing if I can improve the overall speed in places. Gareth aka. Kit On Fri 16/11/18 22:58 , "Florian Klämpfl" flor...@freepascal.org sent: Am 16.11.2018 um 23:41 schrieb Florian Klämpfl: > Am 16.11.2018 um 23:36 schrieb Jonas Maebe: >> On 16/11/18 22:44, Florian Klämpfl wrote: >>> With some compiler tuning and a few tricks (two changes to the code and hand-simulated peephole optimizations, but I >>> think these tricks can also the compiler do): >> >> You can improve performance further by devirtualising all method calls using wpo. First compile it with -FWvipri.wpo >> -OWDEVIRTCALLS,OPTVMTS and next with -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (at least on my machine it gives a small boost, >> and makes the results also more stable). >> >> Since I only have a preliminary llvm version (with Dwarf EH) running on macOS, I can't provide a direct Kylix >> comparison. The versions below are both x86-64. As mentioned before, a 32 bit FPC/LLVM is still quite a way off. >> >> * FPC 3.0.4 -MDelphi -O2 -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS: >> >> $ time ./vipribenchmemcache_nodeps >> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, NumberOfChannels=6, BufferPackets=5000, >> NumberOfSynchroThreads=4 >> ................................................................................................. >> Time: 5016ms = 9669059 pkts/s = 14680 MB/s >> >> real 0m5.137s >> user 0m5.042s >> sys 0m0.017s >> >> FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm IR) and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no >> LLVM link-time optimization): >> >> $ time ./vipribenchmemcache_nodeps_llvm >> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, NumberOfChannels=6, BufferPackets=5000, >> NumberOfSynchroThreads=4 >> ................................................................................................................. >> Time: 5018ms = 11259466 pkts/s = 17094 MB/s >> >> real 0m5.161s >> user 0m5.060s >> sys 0m0.017s >> > > Can you test with FPC 3.1.1 native, -O4 and the following patch: > > compiler/nmem.pas | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/compiler/nmem.pas b/compiler/nmem.pas > index d5c1d85e8f..52add1fd81 100644 > --- a/compiler/nmem.pas > +++ b/compiler/nmem.pas > @@ -1176,7 +1176,7 @@ implementation > begin > include(flags,nf_write); > { see comment in tsubscriptnode.mark_write } > - if not(is_implicit_pointer_object_type(left.resultdef)) then > + if not(is_implicit_array_pointer(left.resultdef)) then > left.mark_write; > end; > > ? Hmmm, needs a few more of my changes to make work, though it should work if used only with the benchmark. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [1]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: ------ [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel