Georg-Johann Lay wrote: > If you are sure it is really some performance issue/regression and not > due to some language standard implication, you can add a report to > http://sourceforge.net/tracker/?group_id=68108 > > so that the subject won't be forgotten. Also mind > http://gcc.gnu.org/bugs.html > > And of course, you can ask questions here. In that case it is helpful > if you can manage to simplify the source to a small piece of code that > triggers the problem and allows others to reproduce the problem. (i.e. > no #include in the code, no ... (except for varargs), a.s.o). > > Snippets of .s may point to the problem when you add -dp -fverbose-asm > > And there are lots of places where avr-gcc produces suboptimal or even > bad code, so feedback is welcome. > > But note that just a few guys are working on the AVR part of gcc. > I would do more if I had the time (and the support of some gurus to > ask questions on internals then and when...) OK, I only spent a few minutes looking at old code and I found some obviously sub-optimal results. It distills down to this:
#include <avr/io.h> int main(void) { unsigned long packet = 0; while(1) { if( !(PINC & _BV(PC2)) ) { packet = (packet<<1)|(((unsigned char)PINC>>1)&1); } PORTB = packet; } } avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3 The avr/io stuff is just so that it won't optimise the code away to nothing. I tried compiling it with both -Os and -O2: avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c The result includes this: lsl r18 ; packet ; 50 *ashlsi3_const/2 [length = 4] rol r19 ; packet rol r20 ; packet rol r21 ; packet in r24,38-0x20 ; D.1214, ; 16 *movqi/4 [length = 1] lsr r24 ; D.1214 ; 17 lshrqi3/3 [length = 1] ldi r25,lo8(0) ; , ; 48 *movqi/2 [length = 1] ldi r26,lo8(0) ; , ; 46 *movhi/4 [length = 2] ldi r27,hi8(0) ; , andi r24,lo8(1) ; tmp52, ; 19 andsi3/2 [length = 4] andi r25,hi8(1) ; tmp52, andi r26,hlo8(1) ; tmp52, andi r27,hhi8(1) ; tmp52, or r18,r24 ; packet, tmp52 ; 20 iorsi3/1 [length = 4] or r19,r25 ; packet, tmp52 or r20,r26 ; packet, tmp52 or r21,r27 ; packet, tmp52 The problem, it seems, is that the compiler doesn't realize that the right hand side of the expression can only have any non-zero values in the bottom 8 bits, since it's an unsigned char which is being implicitly expanded to 32 bits for the or operation. In fact, it's only the bottom bit that's ever non-zero. As a result it's spending a number of cycles and registers doing useless things. I'll copy a report to the locations you mention in your e-mail. There are probably ways to work around this, such as making "packet" a union of an unsigned char and a long, then shifting the long and only ORing in the unsigned char. I'll note that there's also an optimization to be had with the right hand side of the expression. I would write the assembly something like this: lsl r18 rol r19 rol r20 rol r21 in r24,38-0x20 bst r24, 1 bld r18, 0 I'm sure I can find other examples of poor code generation in this particular file, since I remember coming across many cases where I replaced the generated code with inline assembly when I was originally working on it, but that will have to wait for later. Thanks for your help, I appreciate it. As I said avr-gcc is pretty good, but I would love it if it could get even better :) Nicholas
_______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list