Hello all, I'm not sure whether this has been posted before, but gcc creates slightly inefficient code for large integers in several cases:
unsigned long long val; void example1() { val += 0x800000000000ULL; } On x86 this results in the following assembly: addl $0, val adcl $32768, val+4 ret The first add is unnecessary as it shouldn't modify val or set the carry. This isn't too bad, but compiling for a something like AVR, results in 8 byte loads, followed by three additions (of the high bytes), followed by another 8 byte saves. The compiler doesn't recognize that 5 of those loads and 5 of those saves are unnecessary. Replacing the addition, with bitwise or/xor also produces an unnecessary instruction on x86, but produces optimal instructions on an AVR. Here is another inefficiency for x86: unsigned long long val = 0; unsigned long small = 0; unsigned long long example1() { return val | small; } unsigned long long example2() { return val & small; } This produces for example1 (bad): movl small, %ecx movl val, %eax movl val+4, %edx pushl %ebx xorl %ebx, %ebx orl %ecx, %eax orl %ebx, %edx popl %ebx ret For example2 (good): movl small, %eax xorl %edx, %edx andl val, %eax ret The RTL's generated for example1 and example2 are very similar until the fwprop1 stage. Since the largest word size on x86 is 4 bytes, each operation is actually split into two. The forward propagator correctly realizes that anding the upper 4 bytes results in a zero. However, it doesn't seem to recognize that oring the upper 4 bytes should return val's high word. This problem also occurs in the xor operation, and also when subtracting (val - small). All programs were compiled with "-O2 -Wall" although I also tried -O3 and -Os with the same result. Thanks for any help.