[EMAIL PROTECTED] wrote:
Hi

I've found a case which looks like it should be possible to optimise but
gcc (very recent trunk) isn't doing which could give improvements in
many cases - certainly in a case I've come across:

    #ifdef NEW
    unsigned int fn(unsigned int n, unsigned int dmax) throw()
    {
      for (unsigned int d = 0; d < dmax; ++d) {
        n += d?d:1;
      }
      return n;
    }
    #else
    unsigned int fn(unsigned int n, unsigned int dmax) throw()
    {
      unsigned int add = 1;
      for (unsigned int d = 0; d < dmax; add = ++d) {
        n += add;
      }
      return n;
    }
    #endif

When compiled with -O3 -DOLD I get:

        .p2align 4,,15
    .globl _Z2fnjj
        .type   _Z2fnjj, @function
    _Z2fnjj:
    .LFB2:
        testl   %esi, %esi
        je  .L2
        movl    $1, %edx
        xorl    %eax, %eax
        .p2align 4,,10
        .p2align 3
    .L3:
        addl    $1, %eax
        addl    %edx, %edi
        cmpl    %esi, %eax
        movl    %eax, %edx
        jne .L3
    .L2:
        movl    %edi, %eax
        ret
    .LFE2:
        .size   _Z2fnjj, .-_Z2fnjj
but with -DNEW I get:

        .p2align 4,,15
    .globl _Z2fnjj
        .type   _Z2fnjj, @function
    _Z2fnjj:
    .LFB2:
        testl   %esi, %esi
        je  .L2
        movl    $1, %edx
        xorl    %eax, %eax
        movl    $1, %ecx
        jmp .L7
        .p2align 4,,10
        .p2align 3
    .L5:
        testl   %eax, %eax
        movl    %ecx, %edx
        cmovne  %eax, %edx
    .L7:
        addl    $1, %eax
        addl    %edx, %edi
        cmpl    %esi, %eax
        jne .L5
    .L2:
        movl    %edi, %eax
        ret
    .LFE2:
        .size   _Z2fnjj, .-_Z2fnjj

The performance difference is about 50% with -DNEW taking 1.5 times as
long as -DOLD (that was with dmax == 1000000000).

The loop unfortunately can't always be written as in -DOLD as the
implementation of an iterator adapter might use ?: to special case the
first element of a sequence and when used in a generic algorithm which
just has the simple loop of -DNEW it ought to be optimised like -DOLD if
inlining occurs.

I don't see why you special case the first iteration of a loop with ? inside the loop. Simply write the first iteration separately, and begin the loop with the next iteration. It should be a lot clearer both to us and to the compiler what is your intention. Doesn't this belong on gcc-help? Better peeling optimization needs more justification than this.

Reply via email to