------- Comment #3 from hubicka at gcc dot gnu dot org 2009-01-14 20:20 ------- It might be IRA change. Chips generally preffer separate load and execute instruction as in the old loop over the load+execute since they are easier to retire.
Splitting the instruction post reload probably won't do much good, since there is extra move already. If just splitting the instruction would help, we can macroize: (define_peephole2 [(match_scratch:SI 2 "r") (parallel [(set (match_operand:SI 0 "register_operand" "") (match_operator:SI 3 "arith_or_logical_operator" [(match_dup 0) (match_operand:SI 1 "memory_operand" "")])) (clobber (reg:CC FLAGS_REG))])] "optimize_insn_for_speed_p () && ! TARGET_READ_MODIFY" [(set (match_dup 2) (match_dup 1)) (parallel [(set (match_dup 0) (match_op_dup 3 [(match_dup 0) (match_dup 2)])) (clobber (reg:CC FLAGS_REG))])] "") peephole for vector modes too. Vladimir, perhaps IRA can be tweaked here somehow? -- hubicka at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vmakarov at redhat dot com, | |hubicka at gcc dot gnu dot | |org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824