https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #15 from luoxhu at gcc dot gnu.org ---
In combine: vec_select(vec_concat and the followed vec_select are combined to a
single extract instruction, which seems reasonable for both LE and BE?

R146:   0 1 2 3
R141:   4 5 6 7
R150:   2 6 3 7    // vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7])
R151:   R150[3]    // vec_select(r150:V4SI,3)

=> 

R151:   R141[3]   //  vec_select(r141:V4SI,3)



Trying 21 -> 24:
   21: r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel)
      REG_DEAD r146:V4SI
      REG_DEAD r141:V4SI
   24: {r151:SI=vec_select(r150:V4SI,parallel);clobber scratch;}
Failed to match this instruction:
(parallel [
        (set (reg:SI 151)
            (vec_select:SI (reg:V4SI 141)
                (parallel [
                        (const_int 3 [0x3])
                    ])))
        (clobber (scratch:SI))
        (set (reg:V4SI 150)
            (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
                    (reg:V4SI 141))
                (parallel [
                        (const_int 2 [0x2])
                        (const_int 6 [0x6])
                        (const_int 3 [0x3])
                        (const_int 7 [0x7])
                    ])))
    ])
Failed to match this instruction:
(parallel [
        (set (reg:SI 151)
            (vec_select:SI (reg:V4SI 141)
                (parallel [
                        (const_int 3 [0x3])
                    ])))
        (set (reg:V4SI 150)
            (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
                    (reg:V4SI 141))
                (parallel [
                        (const_int 2 [0x2])
                        (const_int 6 [0x6])
                        (const_int 3 [0x3])
                        (const_int 7 [0x7])
                    ])))
    ])
Successfully matched this instruction:
(set (reg:V4SI 150)
    (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
            (reg:V4SI 141))
        (parallel [
                (const_int 2 [0x2])
                (const_int 6 [0x6])
                (const_int 3 [0x3])
                (const_int 7 [0x7])
            ])))
Successfully matched this instruction:
(set (reg:SI 151)
    (vec_select:SI (reg:V4SI 141)
        (parallel [
                (const_int 3 [0x3])
            ])))
allowing combination of insns 21 and 24
original costs 4 + 4 = 8
replacement costs 4 + 4 = 8
modifying insn i2    21:
r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel)
      REG_DEAD r146:V4SI
deferring rescan insn with uid = 21.
modifying insn i3    24: {r151:SI=vec_select(r141:V4SI,parallel);clobber
scratch;}
      REG_DEAD r141:V4SI
deferring rescan insn with uid = 24.


I guess the previous unspec implementation bypassed the LE + LE swap check, so
now in split2, we should generate vextuwlx instead of vextuwrx on little
endian?

Reply via email to