Some small optimization issues with gcc 4.0 20050418

Sebastian Biallas Thu, 21 Apr 2005 07:35:11 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello!


I just tested the prerelease of gcc 4.0 (to see whether my programs
still compile and work), and I must say: Congratulations, no real
problems so far.

But I noticed some smaller optimization issues on x86, and on of them is
a regression to gcc 3.3 so I'm reporting this here. Accept my apologies
if this is already known, but I think it's worth noting.

So, for the real stuff, take this c program:

=============================== example1.c ===================
#include <stdio.h>

void test()
{
        int i;
        for (i=10; i>=0; i--) {
                printf("%d\n", i);
        }
}

int main()
{
        test();
        return 0;
}
===============================


Everthing was compiled using:
gcc -S -O3 -fomit-frame-pointer -o output input

gcc 3.3 compiles the test() function to the following x86 assembler:

===============================
test:
        pushl   %ebx
        subl    $8, %esp
        movl    $10, %ebx
        .p2align 4,,15
.L30:
        movl    %ebx, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        decl    %ebx
        jns     .L30
        addl    $8, %esp
        popl    %ebx
        ret
===============================

I guess that can't be improved.
But gcc 4.0 thinks so! It compiles the very same code to

===============================
test:
        pushl   %esi
        movl    $-1, %esi    [1]
        pushl   %ebx
        movl    $10, %ebx
        subl    $20, %esp    [2]
        .p2align 4,,15
.L2:
        movl    %ebx, 4(%esp)
        decl    %ebx
        movl    $.LC0, (%esp)
        call    printf
        cmpl    %ebx, %esi   [3]
        jne     .L2
        addl    $20, %esp
        popl    %ebx
        popl    %esi
        ret
===============================

[1] Why keep the -1 constant in %esi? The cmpl with constant is only 1
byte longer.. this doesn't justify this.
[2] It's allocating 5 words on stack while 2 would be enough. I know
that gcc isn't very smart at optimizing the stack slots but this is a
regression
[3] Why use the cmpl at all? gcc 3.3 did this right, I don't think the
cmpl is faster than a decl (and even then, the cmpl could be replaced by
a "subl $1, %ebx")

NB: When gcc inlines this function, it will be compiled to
===============================
main:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        movl    $10, %ebx
        subl    $20, %esp
        andl    $-16, %esp
        subl    $16, %esp
        .p2align 4,,15
.L9:
        movl    %ebx, 4(%esp)
        decl    %ebx
        movl    $.LC0, (%esp)
        call    printf
        cmpl    $-1, %ebx          <-----
        jne     .L9
        movl    -4(%ebp), %ebx
        xorl    %eax, %eax
        leave
        ret
==============================
(test() is inlined in main())

As you can see, now gcc doesn't use a register for the -1 constant.
Quite odd I think.


**********************************************************************

Now for a second example:

============================== example2.c ===================
#include <stdio.h>

int i;
void test()
{
        for (i=10; i>=0; i--) {
                printf("%d\n", i);
        }
}

int main()
{
        test();
        return 0;
}
==============================

This is roughly the same as example 1, but "i" is now a global variable.

We can directly take a look on how gcc-4.0 compiles this, because
gcc-3.3 does almost the same:

==============================
test:
        movl    $10, %eax           [2]
        subl    $12, %esp           [1]
        movl    %eax, i
        movl    $10, %eax           [2]
        .p2align 4,,15
.L2:
        movl    %eax, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        movl    i, %eax
        decl    %eax                [3]
        testl   %eax, %eax          [3]
        movl    %eax, i
        jns     .L2
        addl    $12, %esp
        ret
==============================

[1] Again, the wasted stack. gcc-3.3 doesn't get this right, too.
[2] Even a peephole optimizer could optimize this :)
[3] The testl is unneeded, the flags are already prepared by the decl.
Is this a hard optimization to accomplish? It's quite obvious for a
human, but I don't know how this looks from a compiler perspective...

Sebastian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQEVAwUBQme5/f81M8QtvOSJAQLRGggAnpufAt1xuImGpsw0aTk/gCD+FmGUq2LR
3mPPX+E0zCbJCVfyuJl45j0fbyjhrEpqKdQ+rkpUhvBpC/BN2kO3clDZktHczMuq
WjjPQxbcBGX1jSvGQVS5bfgXIaeYRF5V9quzm3N4c0hXSsPHlwHCa4jbAQxCqdly
8XH9wzCUyjpfxDKG4zSzAS5DUg/hdAbBCekLBAjTSZhCqr1XmZJ5SmNIu9ZH0anU
rMDYaZPFJ4Cq291xON4R1g5enSnwkdlxh6zGmtvsXwY+KbJW1Tpq5q80lSjx7RUF
P5IZsvoqOzdV6PvUBhqft/w1xCRWn/11bgyuAfJ3Wna8j3IXeJHoiA==
=5WkM
-----END PGP SIGNATURE-----

Some small optimization issues with gcc 4.0 20050418

Reply via email to