Re: [fpc-devel] Peephole optimizer tai class change proposals

J. Gareth Moreton via fpc-devel Tue, 19 Oct 2021 12:10:57 -0700

I do hope we can find an acceptable proposal. I'm finding otherpotential uses for such extra information that can't really bereplicated in the optimiser any other way, or it will take prohibitivelylong to do so (e.g. on the order of O(n²)). For example, take thisassembly snippet:


    addl    $1,%edi
    movq    -264(%rbp),%rax
    movl    %edi,%edx
    movq    56(%rax,%rdx,8),%rax
    (%rdx deallocated)

At the moment, the optimiser cannot make this any better, but as humans,we can note that the upper 32 bits of %rdi are zero because of the ADDinstruction, and so improved code would be:


    addl    $1,%edi
    movq    -264(%rbp),%rax
    movq    56(%rax,%rdi,8),%rax

(Note that even if %edi wasn't used later, we can't merge the ADDinstruction into the reference because of what happens if %edi =$FFFFFFFF before the ADD instruction)

Currently, the compiler cannot do this optimisation automaticallybecause while it evaluates "movl %edi,%edx", the peephole optimizerdoesn't know about the state of %rdi - for all it knows, the upper 32bits could be non-zero. A solution could be for the ADD instruction tosearch ahead for the next MOV instruction that uses its output registerand store extra information there to indicate that the upper 32 bits aredefinitely zero. There might be other solutions to this particularoptimisation, like extending the capabilities of register-trackingobjects, although i don't know how practical this really is.

A deeper example might be to help track dependency chains to help thecompiler make more informed long-distance optimisations. Currently, I'mimproving some of the optimisations made by OptPass2ADD and OptPass2SUBunder -O3 in the name of breaking dependency chains, which can span afair few instructions. However, there are a few cases where it grows thecode size without improving speed. For example:


    addl    $1,%edi
    movq    16(%rbx),%rax
    movq    8(%rax),%rcx
    movl    %edi,%edx

Under my improvements, which are beneficial in most places, this willchange to the following:


    leal    1(%edi),%edx
    addl    $1,%edi
    movq    16(%rbx),%rax
    movq    8(%rax),%rcx

Because of the dependency chain between "movq 16(%rbx),%rax" and "movq8(%rax),%rcx" along with using another AGU, this sequence will stilltake 2 cycles to execute at best (not counting latency from memoryaccesses). Better insight into the dependency chain and the executionports used might tell the compiler not to make the optimisation in thiscase.

Heh, this is starting to delve beyond simple logic and into the realm ofmachine learning if I'm not careful!


Gareth aka. Kit

On 17/10/2021 15:24, J. Gareth Moreton via fpc-devel wrote:

That's why I was discussing with Jonas in how to handle that, sincecurrently tai objects don't have a clean way to free them themselves,and optinfo is an untyped Pointer. However, Jonas suggested to havethe extra info objects stored in a linked list, so the solution I havein my showcase is a linked list owned by the TAsmOptimizer object thatfrees everything when it's destroyed. If an instruction is added ordestroyed, the extra info objects associated with remain in the linkedlist, just not attached to anything. True, there would be danglingpointers left in them, but they're solely for searching purposes andthey generally aren't dereferenced. You need a valid tai object toaccess the relevant extra info.
Granted, it would be cleaner to simply have the extra opt accessedthrough a new field in the tai declaration, and the constructor anddestructor handle initialisation and cleanup.
Gareth aka. Kit

On 17/10/2021 15:00, Florian Klämpfl via fpc-devel wrote:
Am 11.10.2021 um 10:00 schrieb J. Gareth Moreton via fpc-devel<fpc-devel@lists.freepascal.org>:
One for Jonas mainly, but also for Florian. This is a new "extraoptimisation information" feature that allows the peephole optimizerto leave 'notes' and other extra information on individual taiobjects for later reference. An initial showcase is to store a linkto the destination label if it's not available in the lookup table(becuase it was created later by a peephole optimisation).
https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/74/diffs
Currently the showcase doesn't appear to show any additionaloptimisations in the x86-64 RTL because the jump optimisation thatcreates a new label is almost never called.
I'll use this feature more extensively in the future, such as forstoring information on the values of registers or making a note of alabel that should be removed if possible because it would cause along-term optimisation (something that a peephole optimisation thatremoves the label may not be able to determine because its ownoptimisation is questionable without that information). Seeprevious e-mails in this chain for an example.
I fear a little bit that this extra info is messed up wheninstructions are added/removed.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Peephole optimizer tai class change proposals

Reply via email to