Yes, it appeared under -O4. However, specifying -Oodeadstore caused
both instructions to be removed, but it makes sense because a call
followed those mov instructions, which sets %rax and, under
x86_64-win64, is not taken as a parameter (i.e. the value of %rax is
discarded upon calling a subroutine).
Thanks for pointing out where peephole optimisation is wasted and a
non-issue. I need to study nodes more! *scratches off mov/cmp checks!*
Just to note with the last optimisation over at #36687 that's been
giving me hassle until now, it deals mostly with constants that get
sign-extended or zero-extended. For example, in the same test, there
are sequences such as this:
movb $-63,%al
movsbl %al,%eax
... the patch now (correctly) changes that to "movl $-63,%eax".
Deadstore and lack of constant propagation isn't affected.
Gareth aka. Kit
On 20/02/2020 21:05, Florian Klämpfl wrote:
Am 20.02.20 um 21:50 schrieb J. Gareth Moreton:
Oh, sorry, I made a slight error. The sequences only appear if you
specify -Oonoconstprop (and -a). So that sequence is produced with
"\pp\bin\x86_64-win64\ppcx64 -O4 -Oonoconstprop -a test\cg\tcnvint3b.pp"
Then this is a non-issue.
There are still some inefficient combinations though in the assembly
- for example:
movl $61441,%eax
movw $61441,%ax
This is with full -O3? Did you try to add -Oodeadstore?
Gareth aka. Kit
On 20/02/2020 20:45, J. Gareth Moreton wrote:
On 20/02/2020 20:34, Florian Klämpfl wrote:
Am 20.02.20 um 21:25 schrieb J. Gareth Moreton:
but if you run all of the "test/cg/tcnvint3" tests with the "-a"
option, you will notice such sequences in some of the ".s" file.
With full -O3?
Indeed so, with full -O4 even. When compiling
"/test/cg/tcnvint3.pp" (a test that already exists) with -O4, we get
things like this in the assembler dump - command line =
"\pp\bin\x86_64-win64\ppcx64 -O4 test\cg\tcnvint3b.pp":
# Peephole Optimization: movq $16711680,%rax -> movl $16711680,%eax
(immediate can be represented with just 32 bits)
movl $16711680,%eax
cmpl $16711680,%eax
je .Lj29
call P$TCNVINT3_$$_FAIL
jmp .Lj30
.p2align 4,,10
.p2align 3
.Lj29:
In this case, unless there's a freak CPU error, everything between
"je .Lj29" and its destination label will never execute (if "je
.Lj29" is changed to "jmp .Lj29", everything between them will be
stripped by pass 1 of the peephole optimiser).
Gareth aka. Kit
_______________________________________________
fpc-devel maillist - [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel