Hi, All this talk of super-optimisers, etc. brings this issue I've had, to the fore again.
Perhaps there are suggestions on how to encourage the compiler to do the optimisations rather than having to hand-code to get the best result. Take this code: volatile uint32_t status; static inline void set_status(uint32_t flag, uint8_t set) __attribute__((always_inline)); void set_status(uint32_t flag, uint8_t set) { if (set) status |= flag; else status &= ~flag; } Which when used in an interrupt generates: ISR (INT7_vect) { 4e9a: 1f 92 push r1 4e9c: 0f 92 push r0 4e9e: 0f b6 in r0, 0x3f ; 63 4ea0: 0f 92 push r0 4ea2: 11 24 eor r1, r1 4ea4: 8f 93 push r24 4ea6: 9f 93 push r25 4ea8: af 93 push r26 4eaa: bf 93 push r27 4eac: 80 91 1a 18 lds r24, 0x181A 4eb0: 90 91 1b 18 lds r25, 0x181B 4eb4: a0 91 1c 18 lds r26, 0x181C 4eb8: b0 91 1d 18 lds r27, 0x181D 4ebc: a0 64 ori r26, 0x40 ; 64 4ebe: 80 93 1a 18 sts 0x181A, r24 4ec2: 90 93 1b 18 sts 0x181B, r25 4ec6: a0 93 1c 18 sts 0x181C, r26 4eca: b0 93 1d 18 sts 0x181D, r27 set_status(0x00400000UL, 1); 4ed0: bf 91 pop r27 4ed2: af 91 pop r26 4ed4: 9f 91 pop r25 4ed6: 8f 91 pop r24 4ed8: 0f 90 pop r0 4eda: 0f be out 0x3f, r0 ; 63 4edc: 0f 90 pop r0 4ede: 1f 90 pop r1 4ee0: 18 95 reti That's way too much code... Its fairly obvious where the optimisations should be, although I can see that some have been done already. If I recode the routine as follows, I get what I want: void set_status(uint32_t flag, uint8_t set) { volatile uint8_t *pstatus = (uint8_t *)&status; if (set) { if (flag & 0x000000ff) pstatus[0] |= (uint8_t)(flag & 0xff); if (flag & 0x0000ff00) pstatus[1] |= (uint8_t)(flag >> 8); if (flag & 0x00ff0000) pstatus[2] |= (uint8_t)(flag >> 16); if (flag & 0xff000000) pstatus[3] |= (uint8_t)(flag >> 24); } else { if (flag & 0x000000ff) pstatus[0] &= ~(uint8_t)(flag & 0xff); if (flag & 0x0000ff00) pstatus[1] &= ~(uint8_t)(flag >> 8); if (flag & 0x00ff0000) pstatus[2] &= ~(uint8_t)(flag >> 16); if (flag & 0xff000000) pstatus[3] &= ~(uint8_t)(flag >> 24); } } Which is: ISR (INT7_vect) { 6b80: 1f 92 push r1 6b82: 0f 92 push r0 6b84: 0f b6 in r0, 0x3f ; 63 6b86: 0f 92 push r0 6b88: 11 24 eor r1, r1 6b8a: 8f 93 push r24 6b8c: 80 91 4e 18 lds r24, 0x184E 6b90: 80 64 ori r24, 0x40 ; 64 6b92: 80 93 4e 18 sts 0x184E, r24 set_status(0x00400000UL, 1); 6b98: 8f 91 pop r24 6b9a: 0f 90 pop r0 6b9c: 0f be out 0x3f, r0 ; 63 6b9e: 0f 90 pop r0 6ba0: 1f 90 pop r1 6ba2: 18 95 reti Compiler options are: avr-gcc -mmcu=atmega2560 -Wall -gdwarf-2 -std=gnu99 -Os -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -finline-limit=10 -DF_CPU=7372800UL version 4.3.3 In a 75k byte application this reduces code size by 1632 bytes. Dale. _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list