-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello!
I just tested the prerelease of gcc 4.0 (to see whether my programs still compile and work), and I must say: Congratulations, no real problems so far. But I noticed some smaller optimization issues on x86, and on of them is a regression to gcc 3.3 so I'm reporting this here. Accept my apologies if this is already known, but I think it's worth noting. So, for the real stuff, take this c program: =============================== example1.c =================== #include <stdio.h> void test() { int i; for (i=10; i>=0; i--) { printf("%d\n", i); } } int main() { test(); return 0; } =============================== Everthing was compiled using: gcc -S -O3 -fomit-frame-pointer -o output input gcc 3.3 compiles the test() function to the following x86 assembler: =============================== test: pushl %ebx subl $8, %esp movl $10, %ebx .p2align 4,,15 .L30: movl %ebx, 4(%esp) movl $.LC0, (%esp) call printf decl %ebx jns .L30 addl $8, %esp popl %ebx ret =============================== I guess that can't be improved. But gcc 4.0 thinks so! It compiles the very same code to =============================== test: pushl %esi movl $-1, %esi [1] pushl %ebx movl $10, %ebx subl $20, %esp [2] .p2align 4,,15 .L2: movl %ebx, 4(%esp) decl %ebx movl $.LC0, (%esp) call printf cmpl %ebx, %esi [3] jne .L2 addl $20, %esp popl %ebx popl %esi ret =============================== [1] Why keep the -1 constant in %esi? The cmpl with constant is only 1 byte longer.. this doesn't justify this. [2] It's allocating 5 words on stack while 2 would be enough. I know that gcc isn't very smart at optimizing the stack slots but this is a regression [3] Why use the cmpl at all? gcc 3.3 did this right, I don't think the cmpl is faster than a decl (and even then, the cmpl could be replaced by a "subl $1, %ebx") NB: When gcc inlines this function, it will be compiled to =============================== main: pushl %ebp movl %esp, %ebp pushl %ebx movl $10, %ebx subl $20, %esp andl $-16, %esp subl $16, %esp .p2align 4,,15 .L9: movl %ebx, 4(%esp) decl %ebx movl $.LC0, (%esp) call printf cmpl $-1, %ebx <----- jne .L9 movl -4(%ebp), %ebx xorl %eax, %eax leave ret ============================== (test() is inlined in main()) As you can see, now gcc doesn't use a register for the -1 constant. Quite odd I think. ********************************************************************** Now for a second example: ============================== example2.c =================== #include <stdio.h> int i; void test() { for (i=10; i>=0; i--) { printf("%d\n", i); } } int main() { test(); return 0; } ============================== This is roughly the same as example 1, but "i" is now a global variable. We can directly take a look on how gcc-4.0 compiles this, because gcc-3.3 does almost the same: ============================== test: movl $10, %eax [2] subl $12, %esp [1] movl %eax, i movl $10, %eax [2] .p2align 4,,15 .L2: movl %eax, 4(%esp) movl $.LC0, (%esp) call printf movl i, %eax decl %eax [3] testl %eax, %eax [3] movl %eax, i jns .L2 addl $12, %esp ret ============================== [1] Again, the wasted stack. gcc-3.3 doesn't get this right, too. [2] Even a peephole optimizer could optimize this :) [3] The testl is unneeded, the flags are already prepared by the decl. Is this a hard optimization to accomplish? It's quite obvious for a human, but I don't know how this looks from a compiler perspective... Sebastian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQEVAwUBQme5/f81M8QtvOSJAQLRGggAnpufAt1xuImGpsw0aTk/gCD+FmGUq2LR 3mPPX+E0zCbJCVfyuJl45j0fbyjhrEpqKdQ+rkpUhvBpC/BN2kO3clDZktHczMuq WjjPQxbcBGX1jSvGQVS5bfgXIaeYRF5V9quzm3N4c0hXSsPHlwHCa4jbAQxCqdly 8XH9wzCUyjpfxDKG4zSzAS5DUg/hdAbBCekLBAjTSZhCqr1XmZJ5SmNIu9ZH0anU rMDYaZPFJ4Cq291xON4R1g5enSnwkdlxh6zGmtvsXwY+KbJW1Tpq5q80lSjx7RUF P5IZsvoqOzdV6PvUBhqft/w1xCRWn/11bgyuAfJ3Wna8j3IXeJHoiA== =5WkM -----END PGP SIGNATURE-----