On Mon, 2018-06-11 at 21:07 +0100, J. Gareth Moreton wrote:
> Thanks David,
> I'm still learning some of the nuances of the Intel and AMD
> processors, but most of it is just logical analysis. Admittedly my
> main drive has been to shrink down the size of the binary, since
> Delphi and Free Pascal have always been a little bit bloated in
> comparison. Not that it is necessarily a bad thing, but saving space
> without sacrificing performance can only be a good thing, especially
> for those with limited bandwidth or for saving those few precious
> bytes when burning files to a CD or DVD.
> There have been a few instances in the compiled compiler (my main
> test case) where an entire register is freed up due to my deep
> optimisation, and that means the corresponding "push" and "pop" at
> either end of the procedure can be removed (along with the
> corresponding stack unwinding information), although I haven't
> started programming that yet.
Isn't it better to perform this optimization before register
allocation. Then, when this happens, the corresponding "push" and "pop"
wouldn't even be put by the compiler, because the register wouldn't
have to be spilled.
> I am ready to submit this part of my deep optimiser as a patch. I'm
> just waiting for Florian's acceptance or rejection of my debug strip
> patch - https://bugs.freepascal.org/view.php?id=33798 (the 3rd
> attempt!) - only because it shares some debugging code with said
> patch (it was useful to monitor how the registers inside references
> were changed). If it's rejected, it just means I'll have to change
> some of that debugging code a bit.
> Gareth aka. Kit
> On Mon 11/06/18 20:27 , David Pethes pub...@satd.sk sent:
> > Hi,
> > nice work.
> > On 8. 6. 2018 0:46, J. Gareth Moreton wrote:
> > > The deep optimiser changes this to:
> > >
> > > movq %rcx,%rax
> > > movq %rdx,%rsi
> > > movq %rcx,%rbx
> > >
> > > It determines, for the third MOV, it can
> > > change %rax for %rcx to minimise a
> > > pipeline stall, and then knows that %rbx
> > > and %rcx contain the same value, so can
> > > remove the 4th MOV completely. Given that
> > > modern processors usually have at least 3
> > > ALUs and the interdependencies have been
> > > removed, this will likely give a speed
> > > increase of one cycle over these few
> > > commands.
> > Note that modern cpu-s can use move elimination for reg to reg
> > moves, so
> > it doesn't cost any execution resources (it's "free"). Despite that
> > it's
> > still a win, because it spares both bytes in I-cache and decoder
> > bandwidth (which can indirectly lead to some spared cycle(s) at
> > other
> > places).
> > David
> > _______________________________________________
> > fpc-devel maillist - email@example.com
> > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel">htt
> > p://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> fpc-devel maillist - firstname.lastname@example.org
fpc-devel maillist - email@example.com