-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James E Wilson wrote:
> Sebastian Biallas wrote:
> 
>> But I noticed some smaller optimization issues on x86, and on of them is
>> a regression to gcc 3.3 so I'm reporting this here. Accept my apologies
>> if this is already known, but I think it's worth noting.
> 
> You can submit optimization regressions into our bugzilla bug database.
>    gcc-4 has a bunch of new and/or rewritten optimization passes, and
> occasionally minor problems with them will be missed.  They are likely
> to be fixed if we get bug reports for them though.

Ok, I'm going to file PRs for the remaining issues once I've checked
what's still open.

> 
>> [2] It's allocating 5 words on stack while 2 would be enough. I know
>> that gcc isn't very smart at optimizing the stack slots but this is a
>> regression
> 
> There is one word for the return address, two words for registers being
> saved, and two words for the printf arguments.

You don't need to reserve a stack slot for the return address on x86.
The stack slot will be allocated implicitly by the "call" instruction.
In fact the explicit reservation of the stack slots for the printf
parameters is a optimization by gcc; this is usually done by "push"
instructions, but since this is a loop invariant, gcc moved this out of
the loop.

If you look at the output from gcc 3.3 you'll see that 2 slots are
indeed optimal.

> 
> There does appear to be a problem here, as we are using pushes in the
> prologue to save registers, which means we should not be allocating
> space for them when we decrement the stack pointer.  The other 3 slots
> appear to be necessary.
> 
>> [3] Why use the cmpl at all? gcc 3.3 did this right, I don't think the
>> cmpl is faster than a decl (and even then, the cmpl could be replaced by
>> a "subl $1, %ebx")
> 
> This looks like another ivopts issue.  If gcc-3.3, we get a >= branch,
> which can use the result of the decrement.  In gcc-4.0, ivopts
> canonicalizes the branch to use !=, which can not use the result of the
> decrement as the condition code flags are set wrong for that.
> 
> This still happens on mainline, and should probably be looked into.
> 
>> [1] Again, the wasted stack. gcc-3.3 doesn't get this right, too.
> 
> I don't believe so.  We have the return address and the two printf
> arguments, so all 3 slots are needed.

No, 2 slots were enough.

> 
>> [2] Even a peephole optimizer could optimize this :)
> 
> Yes, this is embarassing.  I had to use -march=i686 to reproduce this.
> 
> We have a peephole2 pattern that converts
>     movl $10, i
> into
>     movl $10, %eax
>     movl %eax, i
> because it is faster, except that this happens so late that there is no
> chance to perform cse on the result, so we can't delete the duplicate
> constant immediate loads.  So while this is bad, it isn't as bad as it
> might appear at first.

I see.

> 
>> [3] The testl is unneeded, the flags are already prepared by the decl.
>> Is this a hard optimization to accomplish? It's quite obvious for a
>> human, but I don't know how this looks from a compiler perspective...
> 
> This is same as above, we need the testl as we have the wrong kind of
> branch condition.

Maybe it's the same as above.. But while this is also a problem with gcc
3.3 the above example is only an issue with gcc 4.0.

Sebastian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQEVAwUBQmwbWv81M8QtvOSJAQJYpgf/Qf8h/hFWu8ZTZD9nZmOP+TT5PXUKKRs5
kb5Euc+ZTLSoyc3SekK+ECaeb3bOiFs2lDUCkczl3AqKp/DxGJiA/jXfn3pp2Zwo
Km7LbdxuNJXxe0Tcnd0Y3RuD1VzvXajKoAhBPW+2nk3Apz4pPHJz7wDSbNx3QdPM
Va4DM/oPqyKDj8B6+ZPpXK9iElbMAvwEq90l0Y+OEoeKxovkLsPCLRFdSZ4rZXbD
CfWeFxeebk2obI6/NhTQWRJRkwYqEVHNuMi523pVR/m9LSsqXRrn2a9vOa93qQB8
gzzvIOyaiLj9flq8uz3OlESP/8Er57iq91wPtOA9Hts8cLXIeYkEfA==
=2hET
-----END PGP SIGNATURE-----

Reply via email to