Re: [avr-gcc-list] Re: C vs. assembly performance

Nicholas Vinen Sat, 28 Feb 2009 05:43:53 -0800

Georg-Johann Lay wrote:
> If you are sure it is really some performance issue/regression and not
> due to some language standard implication, you can add a report to
>     http://sourceforge.net/tracker/?group_id=68108
>
> so that the subject won't be forgotten. Also mind
>     http://gcc.gnu.org/bugs.html
>
> And of course, you can ask questions here. In that case it is helpful
> if you can manage to simplify the source to a small piece of code that
> triggers the problem and allows others to reproduce the problem. (i.e.
> no #include in the code, no ... (except for varargs), a.s.o).
>
> Snippets of .s may point to the problem when you add -dp -fverbose-asm
>
> And there are lots of places where avr-gcc produces suboptimal or even
> bad code, so feedback is welcome.
>
> But note that just a few guys are working on the AVR part of gcc.
> I would do more if I had the time (and the support of some gurus to
> ask questions on internals then and when...)
OK, I only spent a few minutes looking at old code and I found some
obviously sub-optimal results. It distills down to this:


#include <avr/io.h>

int main(void) {
  unsigned long packet = 0;

  while(1) {
    if( !(PINC & _BV(PC2)) ) {
      packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
    }
    PORTB = packet;
  }
}


avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3

The avr/io stuff is just so that it won't optimise the code away to nothing.

I tried compiling it with both -Os and -O2:

avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c

The result includes this:

        lsl r18  ;  packet       ;  50  *ashlsi3_const/2        [length = 4]
        rol r19  ;  packet
        rol r20  ;  packet
        rol r21  ;  packet
        in r24,38-0x20   ;  D.1214,      ;  16  *movqi/4        [length = 1]
        lsr r24  ;  D.1214       ;  17  lshrqi3/3       [length = 1]
        ldi r25,lo8(0)   ; ,     ;  48  *movqi/2        [length = 1]
        ldi r26,lo8(0)   ; ,     ;  46  *movhi/4        [length = 2]
        ldi r27,hi8(0)   ; ,
        andi r24,lo8(1)  ;  tmp52,       ;  19  andsi3/2        [length = 4]
        andi r25,hi8(1)  ;  tmp52,
        andi r26,hlo8(1)         ;  tmp52,
        andi r27,hhi8(1)         ;  tmp52,
        or r18,r24       ;  packet, tmp52        ;  20  iorsi3/1        [length 
= 4]
        or r19,r25       ;  packet, tmp52
        or r20,r26       ;  packet, tmp52
        or r21,r27       ;  packet, tmp52



The problem, it seems, is that the compiler doesn't realize that the
right hand side of the expression can only have any non-zero values in
the bottom 8 bits, since it's an unsigned char which is being implicitly
expanded to 32 bits for the or operation. In fact, it's only the bottom
bit that's ever non-zero. As a result it's spending a number of cycles
and registers doing useless things. I'll copy a report to the locations
you mention in your e-mail.


There are probably ways to work around this, such as making "packet" a
union of an unsigned char and a long, then shifting the long and only
ORing in the unsigned char. I'll note that there's also an optimization
to be had with the right hand side of the expression. I would write the
assembly something like this:

        lsl r18
        rol r19
        rol r20
        rol r21
        in r24,38-0x20
        bst r24, 1
        bld r18, 0


I'm sure I can find other examples of poor code generation in this
particular file, since I remember coming across many cases where I
replaced the generated code with inline assembly when I was originally
working on it, but that will have to wait for later.


Thanks for your help, I appreciate it. As I said avr-gcc is pretty good,
but I would love it if it could get even better :)



Nicholas

_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

Reply via email to