[fpc-devel] Optimisation challenge!

J. Gareth Moreton via fpc-devel Wed, 08 Dec 2021 16:43:59 -0800

Hi everyone,

I've noticed a bit of a deep potential optimisation for faster code:


    jne     .Lj1806
    movb    $13,-40(%rbp)
    cmpq    $0,-32(%rbp)
    je      .Lj1798
    ...
.Lj1806:
    movb    $1,-40(%rbp)
.Lj1798:
    cmpb    $1,-40(%rbp)
    jne     .Lj1812

If you analyse the jumps and stack values carefully (which could beconsidered registers for sake of simplicity), you can make an assumptionand shortcut a jump in the program flow:


- Prior to "je .Lj1798", the value of -40(%rbp) is set to 13.

- If the program flow jumps here, "cmp $1,-40(%rbp)" sets flags suchthat "jne .Lj1812" will definitely branch, because -40(%rbp) is equal to13 and not 1.- Therefore, the original "je .Lj1798" branch can be redirected to "je.Lj1812"


In other words:

    jne     .Lj1806
    movb    $13,-40(%rbp)
    cmpq    $0,-32(%rbp)
    je      .Lj1812
    ...
.Lj1806:
    movb    $1,-40(%rbp)
.Lj1798:
    cmpb    $1,-40(%rbp)
    jne     .Lj1812

This block behaves exactly the same way, but we've removed a comparisonand a conditional jump from the program flow, which will almostcertainly result in a speed increase due to fewer instructions beingexecuted and less pressure on the branch predictor.

The question and challenge here though... how could one implement thiskind of assembly-level data-flow analysis? Stopping at "je .Lj1812"jumping to the label and then looking ahead to see "cmpb $1,-40(%rbp);jne .Lj1812" is expensive enough, but then also looking backwards tofind a matching MOV instruction is even more expensive. Similarly,looking forward from "movb $13,-40(%rbp)" and analysing everyconditional jump you come across is rather prohibitive too. If that ishard enough, look at the block of code from aasmcpu.s with the ...filled in:


    movb    $13,-40(%rbp)
    cmpq    $0,-32(%rbp)
    je      .Lj1798
    movq    -16(%rbp),%rcx
    call    AASMCPU$_$TAICPU_$__$$_INSEND$$LONGINT
    movslq  %eax,%rax
    movq    -8(%rbp),%rdx
    movq    24(%rdx),%rdx
    subq    56(%rdx),%rax
    movzbl  -63(%rbp),%edx
    subq    %rdx,%rax
    subq    %rax,-24(%rbp)
    jmp     .Lj1798
    .p2align 4,,10
    .p2align 3
.Lj1806:
    movb    $1,-40(%rbp)
.Lj1798:
    cmpb    $1,-40(%rbp)
    jne     .Lj1812

(Note that the "cmpb $1,-40(%rbp)" instruction is caused by anoptimisation that I'm currently developing and won't appear as such yet- it currently appears as "movzbl -40(%rbp),%eax; cmpl $1,%eax")

The same logic applies to "jmp .Lj1798", in that -40(%rbp) is stillequal to 13 and can be redirected to "jmp .Lj1812" (assumingAASMCPU$_$TAICPU_$__$$_INSEND$$LONGINT isn't malformed). In fact, forthis particular procedure, every jump to .Lj1798 sets -40(%rbp) tosomething other than 1 almost immediately prior; it only gets set to 1immediately after label .Lj1806, so logically, .Lj1798 could become adead label, and then "movb $1,-40(%rbp); cmpb $1,-40(%rbp)" are nowadjacent and deterministic, meaning the "jne .Lj1812" will never branchand can be removed. But putting that aside for the moment, how can onetrack the value of -40(%rbp) through all those instructions to the jump,especially through a call?

Of course, it could probably be brute-forced, but that could easily beon the order of O(n²) if not worse. Some conservative assumptions canbe made to terminate a search early, like if %rbp or %rsp (whichever isapplicable) changes value or a write to general memory occurs (unlessit's the stack), since the register that's dereferenced could easily bemade to equal the stack pointer.


Let theorycrafting begin!

Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

[fpc-devel] Optimisation challenge!

Reply via email to