Re: [fpc-devel] New deep optimisation

J. Gareth Moreton via fpc-devel Fri, 01 Oct 2021 11:27:42 -0700

Currently, there's an optimisation that tries to relocate MOVinstructions so they appear before CMP and TEST instructions (you cansee it occurring in the code sample). This is usually generated by the"J(c)Mov0JmpMov1 -> Set(c)" optimisation if the destination is not an8-bit register, in which case, moving the MOV instruction so it appearsbefore CMP (so long as it doesn't share any registers with it) aidsoptimisation if it's something like "movl $0,%eax", which can't beencoded as "xorl %eax,%eax" if the FLAGS register is in use.

Macrofusion is an interesting point. I'll have to look into that one. The only instructions that are moved in this "common instruction"optimisation are ones that don't touch the FLAGS register, but only MOVinstructions are currently moved to appear before CMP and TESTinstructions if possible. In truth, any instruction that doesn't modifythe flags and doesn't share a register with the CMP/TEST instruction canbe moved, and can usually be executed in parallel with the comparison(using another ALU, for example).

It might be that I have to add an extra Pass 2 optimisation that detects"CMP/MOV/Jcc" triplets that remain and "unoptimise" the MOV/Jcc pair inorder to aid macrofusion.


Thanks for the insight Stefan.

Gareth aka. Kit

On 01/10/2021 19:00, Stefan Glienke via fpc-devel wrote:

Keep in mind that usually test/cmp and jcc instructions are macrofusedbut only if they are directly adjacent.
Am 01.10.2021 um 18:10 schrieb J. Gareth Moreton via fpc-devel:
Hi everyone,
I've started playing around with an optimisation on x86 platformsthat looks for common instructions that appear on both branches of aJcc instruction (i.e. after the label it jumps to and after the jumpitself), and so far I'm having a lot of success. For example, in theMath unit - before:
    ...
# Peephole Optimization: %rdx = %rdi; removed unnecessary instruction(MovMov2MovNop 6b}
    call    fpc_do_is
    testb    %al,%al
    je    .Lj196
    movq    %rdi,%rdx
    movq    %rsi,%rcx
    call    CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
    movb    %al,%bl
    jmp    .Lj197
    .p2align 4,,10
    .p2align 3
.Lj196:
    movq    %rdi,%rdx
    movq    %rsi,%rcx
    call    SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
    movb    %al,%bl
.Lj197:
    movb %bl,%al
    ...

After:

    ...
# Peephole Optimization: %rdx = %rdi; removed unnecessary instruction(MovMov2MovNop 6b}
    call    fpc_do_is
# Peephole Optimization: Swapped test and mov instructions to improveoptimisation potential
    movq    %rdi,%rdx
# Peephole Optimization: Swapped test and mov instructions to improveoptimisation potential
    movq    %rsi,%rcx
    testb    %al,%al
# Peephole Optimization: Moved mov instruction common to bothbranches to before jump# Peephole Optimization: Moved mov instruction common to bothbranches to before jump# Peephole Optimization: Moved destination label ahead of commoninstructions
    je    .Lj198
    call    CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
    movb    %al,%bl
    jmp    .Lj197
    .p2align 4,,10
    .p2align 3
.Lj198:
    call    SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
    movb    %al,%bl
.Lj197:
    movb    %bl,%al
    ...
In the above example, the parameter configuration prior to the twoCALL instructions are identical, so it can move these to before thebranching jump.
However, some optimisations are not triggering because they expect ajump or SETcc instruction to appear directly after a TESTinstruction, for example, and I can't just track the FLAGS registerbecause it has to check the condition that's being used too (e.g."MovAndTest2Test" requires the condition be C_E or C_NE).
There are a couple of solutions to this:
- Some instructions like those in the post-peephole stage could beadapted to look ahead further for an appropriate instruction,stopping if it finds one or if it finds another instruction thatmodifies the flags. This will produce more complicated compiler codethough.
- Have a flag that tells the compiler to run pass 1 again after pass2 (and have my common instruction optimisations in pass 2). Thiswould allow deeper optimisations but may cause significant slowdownin the compiler, so I would only recommend this flag be honouredunder -O3 and -O4.
I'm trying to weigh the pros and cons of each, not least because insome cases, my common instruction optimisations aren't as efficientin pass 2 because other pass 1 optimisations ensure the instructionseither side of the branch are no longer identical.
Currently I'm seeing if I can avoid rerunning pass 1 and insteadimproving the problematic optimisations to be more flexible with thelocation of their SETcc and Jcc instructions.
Gareth aka. Kit
_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] New deep optimisation

Reply via email to