Consider the following C code:

typedef __UINT8_TYPE__ uint8_t;

typedef struct {
    uint8_t volatile vout;
    uint8_t out;
} port_t;

#define PORTB (*(port_t*) 0x0420)

void portb_vout (void) {
    PORTB.vout &= 0x80;
}

void portb_out (void) {
    PORTB.out &= 0x80;
}

compiled with avr-gcc from current trunk:

$ avr-gcc -Os -S -dp -fdump-rtl-late_combine1-details

The generated assembly has:

portb_vout:
        ldi r30,lo8(32)  ;  21  [c=4 l=1]  movqi_insn/1
        ldi r31,lo8(4)   ;  22  [c=4 l=1]  movqi_insn/1
        ld r24,Z                 ;  14  [c=4 l=1]  movqi_insn/3
        andi r24,lo8(-128)       ;  15  [c=4 l=1]  *andqi3/1
        st Z,r24                 ;  16  [c=4 l=1]  movqi_insn/2
        ret              ;  19  [c=0 l=1]  return

portb_out:
        lds r24,1057     ;  13  [c=4 l=2]  movqi_insn/3
        andi r24,lo8(-128)       ;  14  [c=4 l=1]  *andqi3/1
        sts 1057,r24     ;  15  [c=4 l=2]  movqi_insn/2
        ret              ;  18  [c=0 l=1]  return

AVR supports direct addressing all RAM with LDS/STS as you can see in 
portb_out.  In the volatile version however the address is loaded into a 
register first.  The .late_combine1 dump has:

;; Function portb_vout (portb_vout, ...

trying to combine definition of r45 in:
    5: r45:HI=0x420
into:
    6: r43:QI=[r45:HI]
failed to match this instruction:
(set (reg:QI 43 [ _1 ])
    (mem/v:QI (const_int 1056 [0x420]) [0 MEM[(struct port_t *)1056B].vout+0 S1 
A8]))


;; Function portb_out (portb_out, ...

successfully matched this instruction to movqi_insn_split:
(set (reg:QI 48 [ MEM[(struct port_t *)1056B].out ])
    (mem:QI (const_int 1057 [0x421]) [0 MEM[(struct port_t *)1056B].out+0 S1 
A8]))
    9: [r45:HI+0x1]=r47:QI
      REG_DEAD r47:QI
      REG_DEAD r45:HI
successfully matched this instruction to movqi_insn_split:
(set (mem:QI (const_int 1057 [0x421]) [0 MEM[(struct port_t *)1056B].out+0 S1 
A8])
    (reg:QI 47 [ _2 ]))
original cost = 4 + 4 + 4, replacement cost = 4 + 4; keeping replacement

So late_combine shies away from propagating an address operand, just because 
the access is volatile.  The reason for the reject is in 
late_combine::optimizable_set():

  if (!insn->can_be_optimized ()
      || insn->is_asm ()
      || insn->is_call ()
      || insn->has_volatile_refs ()
      || insn->has_pre_post_modify ()
      || !can_move_insn_p (insn))
    return NULL_RTX;

What helps is compiling with -ffuse-ops-with-volatile-access, however turning 
that on per default in the backend seems to be a bit too intrusive.

Is there a reason for why this transformation is not performed in all cases? 
(given the addressing mode is suported, and costs give their ok).

IMHO the semantics of the optimized version is the same, as opposed to what 
-ffuse-ops-with-volatile-access is allowed to do.

Johann

Reply via email to