[Bug target/38824] [4.4 regression] performance regression of sse code from 4.2/4.3

hubicka at gcc dot gnu dot org Wed, 14 Jan 2009 12:20:31 -0800


------- Comment #3 from hubicka at gcc dot gnu dot org  2009-01-14 20:20 -------
It might be IRA change.  Chips generally preffer separate load and execute
instruction as in the old loop over the load+execute since they are easier to
retire.


Splitting the instruction post reload probably won't do much good, since there
is extra move already. If just splitting the instruction would help, we can
macroize:
(define_peephole2
  [(match_scratch:SI 2 "r")
   (parallel [(set (match_operand:SI 0 "register_operand" "")
                   (match_operator:SI 3 "arith_or_logical_operator"
                     [(match_dup 0)
                      (match_operand:SI 1 "memory_operand" "")]))
              (clobber (reg:CC FLAGS_REG))])]
  "optimize_insn_for_speed_p () && ! TARGET_READ_MODIFY"
  [(set (match_dup 2) (match_dup 1))
   (parallel [(set (match_dup 0)
                   (match_op_dup 3 [(match_dup 0) (match_dup 2)]))
              (clobber (reg:CC FLAGS_REG))])]
  "") 

peephole for vector modes too.
Vladimir, perhaps IRA can be tweaked here somehow?


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at redhat dot com,
                   |                            |hubicka at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824

[Bug target/38824] [4.4 regression] performance regression of sse code from 4.2/4.3

Reply via email to