https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89049

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
So combine can see

(insn 11 10 13 3 (set (reg:V8SF 105)
        (vec_concat:V8SF (reg:V4SF 106 [ MEM[base: _2, offset: 0B] ])
            (mem:V4SF (plus:DI (reg:DI 85 [ ivtmp.11 ])
                    (const_int 16 [0x10])) [1 MEM[base: _2, offset: 0B]+16 S16
A32]))) "t.c":1:72 5046 {avx_vec_concatv8sf}
     (nil))

with its uses

(insn 13 11 14 3 (set (reg:V4SF 107)
        (vec_select:V4SF (reg:V8SF 105)
            (parallel [
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                ]))) 2702 {vec_extract_lo_v8sf}
     (nil))


(insn 25 24 26 3 (set (reg:V4SF 111)
        (vec_select:V4SF (reg:V8SF 105)
            (parallel [
                    (const_int 4 [0x4])
                    (const_int 5 [0x5])
                    (const_int 6 [0x6])
                    (const_int 7 [0x7])
                ]))) 2711 {vec_extract_hi_v8sf}
     (expr_list:REG_DEAD (reg:V8SF 105)
        (nil)))

but somehow it only tries 11 -> 13:

Trying 11 -> 13:
   11: r105:V8SF=vec_concat(r106:V4SF,[r85:DI+0x10])
      REG_DEAD r106:V4SF
   13: r107:V4SF=vec_select(r105:V8SF,parallel)
...
Successfully matched this instruction:
(set (reg:V8SF 105)
    (vec_concat:V8SF (reg:V4SF 106 [ MEM[base: _2, offset: 0B] ])
        (mem:V4SF (plus:DI (reg:DI 85 [ ivtmp.11 ])
                (const_int 16 [0x10])) [1 MEM[base: _2, offset: 0B]+16 S16
A32])))
Successfully matched this instruction:
(set (reg:V4SF 107)
    (reg:V4SF 106 [ MEM[base: _2, offset: 0B] ]))
allowing combination of insns 11 and 13
original costs 4 + 4 = 8
replacement costs 4 + 4 = 8
modifying insn i2    11: r105:V8SF=vec_concat(r106:V4SF,[r85:DI+0x10])
deferring rescan insn with uid = 11.
modifying insn i3    13: r107:V4SF=r106:V4SF
      REG_DEAD r106:V4SF

then it continues:

Trying 11 -> 25:
   11: r105:V8SF=vec_concat(r106:V4SF,[r85:DI+0x10])
   25: r111:V4SF=vec_select(r105:V8SF,parallel)
      REG_DEAD r105:V8SF
Successfully matched this instruction:
(set (reg:V4SF 111)
    (mem:V4SF (plus:DI (reg:DI 85 [ ivtmp.11 ])
            (const_int 16 [0x10])) [1 MEM[base: _2, offset: 0B]+16 S16 A32]))
rejecting combination of insns 11 and 25
original costs 4 + 4 = 8
replacement cost 12

where it rejects this for some reason...  I think the cost of 4
assigned to 11 is bogus here (maybe combine uses wrong costs, not
accounting for embedded MEMs?)

Reply via email to