I do hope we can find an acceptable proposal.  I'm finding other potential uses for such extra information that can't really be replicated in the optimiser any other way, or it will take prohibitively long to do so (e.g. on the order of O(n²)).  For example, take this assembly snippet:

    addl    $1,%edi
    movq    -264(%rbp),%rax
    movl    %edi,%edx
    movq    56(%rax,%rdx,8),%rax
    (%rdx deallocated)

At the moment, the optimiser cannot make this any better, but as humans, we can note that the upper 32 bits of %rdi are zero because of the ADD instruction, and so improved code would be:

    addl    $1,%edi
    movq    -264(%rbp),%rax
    movq    56(%rax,%rdi,8),%rax

(Note that even if %edi wasn't used later, we can't merge the ADD instruction into the reference because of what happens if %edi = $FFFFFFFF before the ADD instruction)

Currently, the compiler cannot do this optimisation automatically because while it evaluates "movl %edi,%edx", the peephole optimizer doesn't know about the state of %rdi - for all it knows, the upper 32 bits could be non-zero.  A solution could be for the ADD instruction to search ahead for the next MOV instruction that uses its output register and store extra information there to indicate that the upper 32 bits are definitely zero.  There might be other solutions to this particular optimisation, like extending the capabilities of register-tracking objects, although i don't know how practical this really is.

A deeper example might be to help track dependency chains to help the compiler make more informed long-distance optimisations. Currently, I'm improving some of the optimisations made by OptPass2ADD and OptPass2SUB under -O3 in the name of breaking dependency chains, which can span a fair few instructions. However, there are a few cases where it grows the code size without improving speed.  For example:

    addl    $1,%edi
    movq    16(%rbx),%rax
    movq    8(%rax),%rcx
    movl    %edi,%edx

Under my improvements, which are beneficial in most places, this will change to the following:

    leal    1(%edi),%edx
    addl    $1,%edi
    movq    16(%rbx),%rax
    movq    8(%rax),%rcx

Because of the dependency chain between "movq 16(%rbx),%rax" and "movq 8(%rax),%rcx" along with using another AGU, this sequence will still take 2 cycles to execute at best (not counting latency from memory accesses).  Better insight into the dependency chain and the execution ports used might tell the compiler not to make the optimisation in this case.

Heh, this is starting to delve beyond simple logic and into the realm of machine learning if I'm not careful!

Gareth aka. Kit

On 17/10/2021 15:24, J. Gareth Moreton via fpc-devel wrote:
That's why I was discussing with Jonas in how to handle that, since currently tai objects don't have a clean way to free them themselves, and optinfo is an untyped Pointer.  However, Jonas suggested to have the extra info objects stored in a linked list, so the solution I have in my showcase is a linked list owned by the TAsmOptimizer object that frees everything when it's destroyed.  If an instruction is added or destroyed, the extra info objects associated with remain in the linked list, just not attached to anything.  True, there would be dangling pointers left in them, but they're solely for searching purposes and they generally aren't dereferenced.  You need a valid tai object to access the relevant extra info.

Granted, it would be cleaner to simply have the extra opt accessed through a new field in the tai declaration, and the constructor and destructor handle initialisation and cleanup.

Gareth aka. Kit

On 17/10/2021 15:00, Florian Klämpfl via fpc-devel wrote:

Am 11.10.2021 um 10:00 schrieb J. Gareth Moreton via fpc-devel <fpc-devel@lists.freepascal.org>:

One for Jonas mainly, but also for Florian.  This is a new "extra optimisation information" feature that allows the peephole optimizer to leave 'notes' and other extra information on individual tai objects for later reference.  An initial showcase is to store a link to the destination label if it's not available in the lookup table (becuase it was created later by a peephole optimisation).

https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/74/diffs

Currently the showcase doesn't appear to show any additional optimisations in the x86-64 RTL because the jump optimisation that creates a new label is almost never called.

I'll use this feature more extensively in the future, such as for storing information on the values of registers or making a note of a label that should be removed if possible because it would cause a long-term optimisation (something that a peephole optimisation that removes the label may not be able to determine because its own optimisation is questionable without that information).  See previous e-mails in this chain for an example.
I fear a little bit that this extra info is messed up when instructions are added/removed.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to