Consider the following C code:
typedef __UINT8_TYPE__ uint8_t;
typedef struct {
uint8_t volatile vout;
uint8_t out;
} port_t;
#define PORTB (*(port_t*) 0x0420)
void portb_vout (void) {
PORTB.vout &= 0x80;
}
void portb_out (void) {
PORTB.out &= 0x80;
}
compiled with avr-gcc from current trunk:
$ avr-gcc -Os -S -dp -fdump-rtl-late_combine1-details
The generated assembly has:
portb_vout:
ldi r30,lo8(32) ; 21 [c=4 l=1] movqi_insn/1
ldi r31,lo8(4) ; 22 [c=4 l=1] movqi_insn/1
ld r24,Z ; 14 [c=4 l=1] movqi_insn/3
andi r24,lo8(-128) ; 15 [c=4 l=1] *andqi3/1
st Z,r24 ; 16 [c=4 l=1] movqi_insn/2
ret ; 19 [c=0 l=1] return
portb_out:
lds r24,1057 ; 13 [c=4 l=2] movqi_insn/3
andi r24,lo8(-128) ; 14 [c=4 l=1] *andqi3/1
sts 1057,r24 ; 15 [c=4 l=2] movqi_insn/2
ret ; 18 [c=0 l=1] return
AVR supports direct addressing all RAM with LDS/STS as you can see in
portb_out. In the volatile version however the address is loaded into a
register first. The .late_combine1 dump has:
;; Function portb_vout (portb_vout, ...
trying to combine definition of r45 in:
5: r45:HI=0x420
into:
6: r43:QI=[r45:HI]
failed to match this instruction:
(set (reg:QI 43 [ _1 ])
(mem/v:QI (const_int 1056 [0x420]) [0 MEM[(struct port_t *)1056B].vout+0 S1
A8]))
;; Function portb_out (portb_out, ...
successfully matched this instruction to movqi_insn_split:
(set (reg:QI 48 [ MEM[(struct port_t *)1056B].out ])
(mem:QI (const_int 1057 [0x421]) [0 MEM[(struct port_t *)1056B].out+0 S1
A8]))
9: [r45:HI+0x1]=r47:QI
REG_DEAD r47:QI
REG_DEAD r45:HI
successfully matched this instruction to movqi_insn_split:
(set (mem:QI (const_int 1057 [0x421]) [0 MEM[(struct port_t *)1056B].out+0 S1
A8])
(reg:QI 47 [ _2 ]))
original cost = 4 + 4 + 4, replacement cost = 4 + 4; keeping replacement
So late_combine shies away from propagating an address operand, just because
the access is volatile. The reason for the reject is in
late_combine::optimizable_set():
if (!insn->can_be_optimized ()
|| insn->is_asm ()
|| insn->is_call ()
|| insn->has_volatile_refs ()
|| insn->has_pre_post_modify ()
|| !can_move_insn_p (insn))
return NULL_RTX;
What helps is compiling with -ffuse-ops-with-volatile-access, however turning
that on per default in the backend seems to be a bit too intrusive.
Is there a reason for why this transformation is not performed in all cases?
(given the addressing mode is suported, and costs give their ok).
IMHO the semantics of the optimized version is the same, as opposed to what
-ffuse-ops-with-volatile-access is allowed to do.
Johann