[EMAIL PROTECTED] wrote:
Hi
I've found a case which looks like it should be possible to optimise but
gcc (very recent trunk) isn't doing which could give improvements in
many cases - certainly in a case I've come across:
#ifdef NEW
unsigned int fn(unsigned int n, unsigned int dmax) throw()
{
for (unsigned int d = 0; d < dmax; ++d) {
n += d?d:1;
}
return n;
}
#else
unsigned int fn(unsigned int n, unsigned int dmax) throw()
{
unsigned int add = 1;
for (unsigned int d = 0; d < dmax; add = ++d) {
n += add;
}
return n;
}
#endif
When compiled with -O3 -DOLD I get:
.p2align 4,,15
.globl _Z2fnjj
.type _Z2fnjj, @function
_Z2fnjj:
.LFB2:
testl %esi, %esi
je .L2
movl $1, %edx
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L3:
addl $1, %eax
addl %edx, %edi
cmpl %esi, %eax
movl %eax, %edx
jne .L3
.L2:
movl %edi, %eax
ret
.LFE2:
.size _Z2fnjj, .-_Z2fnjj
but with -DNEW I get:
.p2align 4,,15
.globl _Z2fnjj
.type _Z2fnjj, @function
_Z2fnjj:
.LFB2:
testl %esi, %esi
je .L2
movl $1, %edx
xorl %eax, %eax
movl $1, %ecx
jmp .L7
.p2align 4,,10
.p2align 3
.L5:
testl %eax, %eax
movl %ecx, %edx
cmovne %eax, %edx
.L7:
addl $1, %eax
addl %edx, %edi
cmpl %esi, %eax
jne .L5
.L2:
movl %edi, %eax
ret
.LFE2:
.size _Z2fnjj, .-_Z2fnjj
The performance difference is about 50% with -DNEW taking 1.5 times as
long as -DOLD (that was with dmax == 1000000000).
The loop unfortunately can't always be written as in -DOLD as the
implementation of an iterator adapter might use ?: to special case the
first element of a sequence and when used in a generic algorithm which
just has the simple loop of -DNEW it ought to be optimised like -DOLD if
inlining occurs.
I don't see why you special case the first iteration of a loop with ?
inside the loop. Simply write the first iteration separately, and begin
the loop with the next iteration. It should be a lot clearer both to us
and to the compiler what is your intention.
Doesn't this belong on gcc-help? Better peeling optimization needs more
justification than this.