http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27663
Georg-Johann Lay <gjl at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |gjl at gcc dot gnu.org Known to fail| | --- Comment #7 from Georg-Johann Lay <gjl at gcc dot gnu.org> 2011-05-16 15:05:41 UTC --- The patch tries to fix the middle-end flaw in the BE by introducing some combine patterns that recognize byte-insert. Wouldn't it be possible to recognize such situations in the middle-end and map them to something like (set (zero_extract:QI (reg:SI) ...)) or (set (subreg:QI (reg:SI) ...)? Even if the bytes inserted do not come from consecutive memory locations, such a recognition would help. The patch does not lead to optimal code, there is still room for improvement: With -Os -mmcu=atmega8: f: push r16 push r17 /* prologue: function */ /* frame size = 0 */ /* stack size = 2 */ .L__stack_usage = 2 movw r30,r24 ldd r24,Z+1 ldd r16,Z+2 ldi r17,lo8(0) ldi r18,lo8(0) ldi r19,hi8(0) movw r18,r16 clr r17 clr r16 or r19,r24 ldd r24,Z+4 or r16,r24 ldd r24,Z+3 or r17,r24 movw r22,r16 movw r24,r18 /* epilogue start */ pop r17 pop r16 ret The usage of r16/r17 might be an artifact of IRA because only half of a SI reg is call-saved, the other half is call-used. There is the following comment in ira-color.c: /* We need to save/restore the hard register in epilogue/prologue. Therefore we increase the cost. */ { /* ??? If only part is call clobbered. */ Despite subreg lowering, the call-used r26/r27 are not used. Maybe you should also try to disable subreg lowering by means of -fno-split-wide-types. For the code in question that gives: With -Os -mmcu=atmega8 -fno-split-wide-types: f: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 movw r30,r24 ldd r18,Z+1 ldd r22,Z+2 mov r24,r22 ldi r25,lo8(0) ldi r26,lo8(0) ldi r27,hi8(0) clr r23 clr r22 or r25,r18 ldd r18,Z+4 or r22,r18 ldd r18,Z+3 or r23,r18 /* epilogue start */ ret What I do not understand are the insns clearing r26/r27 because they are dead (which is not detected). It is an HI insn that looks like that: ; (insn 32 34 42 (set (reg:HI 26 r26 [ MEM[(unsigned char *)P_1(D) + 2B]+2 ]) ; (const_int 0 [0])) insert-byte.c:5 10 {*movhi} ; (nil)) ldi r26,lo8(0) ; 32 *movhi/1 [length = 2] ldi r27,hi8(0)