Redundant loads and stores created with the new -mtune=bdver1 target. BDVER1 is
optimized to generate packed single moves instead packed double/integer moves
to save 1 byte of space.

Here is the excerpt from the asm dump for ac.f90 benchmark in the Polyhedron
testsuite. Complete asm dump generated with -dP also attached.

vmovaps %xmm15, 304(%rsp)       # 4985  *avx_movv4sf_internal/3 [length = 9]
#(insn 4987 4985 2838 ac.f90:503 (set (reg:V2DF 52 xmm15)
#        (mem/c:V2DF (plus:DI (reg/f:DI 7 sp)
#                (const_int 304 [0x130])) [16 %sfp+-37872 S16 A128])) 1031
{*avx_movv2df_internal} (nil))
        vmovaps 304(%rsp), %xmm15       # 4987  *avx_movv2df_internal/2 [length
= 9]
#(insn 2838 4987 4986 ac.f90:503 (set (reg:V2DF 52 xmm15)
#        (div:V2DF (reg:V2DF 52 xmm15)
#            (mem:V2DF (plus:DI (reg/f:DI 7 sp)
#                    (const_int 32432 [0x7eb0])) [2 *vect_pdclroo.541_5123+0
S16 A128]))) 1100 {avx_divv2df3} (nil))

Comments from Uros:
You are changing V4SFmode to V2DF mode. Since this combination is not
allowed by MODES_TIEABLE_P (and/or CANNOT_CHANGE_MODE_CLASS), value
gets reloaded through the memory. You can perhaps experiment with
these two macros a bit.


-- 
           Summary: Redundant loads and stores generated for AMD bdver1
                    target
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: harsha dot jagasia at amd dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44142

Reply via email to