[fpc-devel] Potential whole program optimization

J. Gareth Moreton via fpc-devel Sun, 18 Jul 2021 16:25:00 -0700

Hi everyone,

I've been playing around with the peephole optimizer on x86_64 a lotlately, and I'm starting to notice that a lot of procedures, both in theRTL and the compiler itself, produce the same assembly language whenfully optimized (or sometimes even before this point). Just as anexample in the assembly for TStream in the classes unit:


.section .text.n_classes$_$tstream_$__$$_readdata$char$$nativeint,"ax"
    .balign 16,0x90
.globl    CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT
    leaq    -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
    movl    $1,%r8d

# Peephole Optimization: %rcx = %rax; removed unnecessary instruction(MovMov2MovNop 6b}# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall(MovXXX2MovXXX)

    movq    (%rcx),%rax
    call    *256(%rax)
    movslq  %eax,%rax
    nop
    leaq    40(%rsp),%rsp
    ret
.seh_endproc

.section .text.n_classes$_$tstream_$__$$_readdata$shortint$$nativeint,"ax"
    .balign 16,0x90
.globl    CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT
    leaq    -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
    movl    $1,%r8d

# Peephole Optimization: %rcx = %rax; removed unnecessary instruction(MovMov2MovNop 6b}# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall(MovXXX2MovXXX)

    movq    (%rcx),%rax
    call    *256(%rax)
    movslq  %eax,%rax
    nop
    leaq    40(%rsp),%rsp
    ret
.seh_endproc

.section .text.n_classes$_$tstream_$__$$_readdata$byte$$nativeint,"ax"
    .balign 16,0x90
.globl    CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT
    leaq    -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
    movl    $1,%r8d

# Peephole Optimization: %rcx = %rax; removed unnecessary instruction(MovMov2MovNop 6b}# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall(MovXXX2MovXXX)

    movq    (%rcx),%rax
    call    *256(%rax)
    movslq  %eax,%rax
    nop
    leaq    40(%rsp),%rsp
    ret
.seh_endproc

The final assembly language of each method is identical.

(Note that the trunk is not this efficient just yet... it still leaves a"movq %rcx,%rax" instruction prior to "movl $1,%r8d" and then calls"movq (%rax),%rax" instead of "movq (%rcx),%rax" - it's still allidentical though).

Would it be plausible to calculate and store a form of message digest(hash) of the final form of the tai entries or machine code and identifycollisions and potential duplicate procedures for whole-programoptimization? Granted I don't know anything about WPO yet so I don'tknow how plausible this is. This wouldn't be somethind done on quick ordebug builds because you'll need to be able to do proper stack traces,and having identical procedures merged into one might cause confusion.


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

[fpc-devel] Potential whole program optimization

Reply via email to