Yes, it appeared under -O4.  However, specifying -Oodeadstore caused both instructions to be removed, but it makes sense because a call followed those mov instructions, which sets %rax and, under x86_64-win64, is not taken as a parameter (i.e. the value of %rax is discarded upon calling a subroutine).

Thanks for pointing out where peephole optimisation is wasted and a non-issue.  I need to study nodes more!  *scratches off mov/cmp checks!*

Just to note with the last optimisation over at #36687 that's been giving me hassle until now, it deals mostly with constants that get sign-extended or zero-extended.  For example, in the same test, there are sequences such as this:

    movb    $-63,%al
    movsbl    %al,%eax

... the patch now (correctly) changes that to "movl $-63,%eax". Deadstore and lack of constant propagation isn't affected.

Gareth aka. Kit


On 20/02/2020 21:05, Florian Klämpfl wrote:
Am 20.02.20 um 21:50 schrieb J. Gareth Moreton:
Oh, sorry, I made a slight error.  The sequences only appear if you specify -Oonoconstprop (and -a). So that sequence is produced with "\pp\bin\x86_64-win64\ppcx64 -O4 -Oonoconstprop -a test\cg\tcnvint3b.pp"

Then this is a non-issue.


There are still some inefficient combinations though in the assembly - for example:

     movl    $61441,%eax
     movw    $61441,%ax

This is with full -O3? Did you try to add -Oodeadstore?


Gareth aka. Kit

On 20/02/2020 20:45, J. Gareth Moreton wrote:
On 20/02/2020 20:34, Florian Klämpfl wrote:
Am 20.02.20 um 21:25 schrieb J. Gareth Moreton:
but if you run all of the "test/cg/tcnvint3" tests with the "-a" option, you will notice such sequences in some of the ".s" file.

With full -O3?

Indeed so, with full -O4 even.  When compiling "/test/cg/tcnvint3.pp" (a test that already exists) with -O4, we get things like this in the assembler dump - command line = "\pp\bin\x86_64-win64\ppcx64 -O4 test\cg\tcnvint3b.pp":

# Peephole Optimization: movq $16711680,%rax -> movl $16711680,%eax (immediate can be represented with just 32 bits)
    movl    $16711680,%eax
    cmpl    $16711680,%eax
    je    .Lj29
    call    P$TCNVINT3_$$_FAIL
    jmp    .Lj30
    .p2align 4,,10
    .p2align 3
.Lj29:

In this case, unless there's a freak CPU error, everything between "je .Lj29" and its destination label will never execute (if "je .Lj29" is changed to "jmp .Lj29", everything between them will be stripped by pass 1 of the peephole optimiser).

Gareth aka. Kit

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to