https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106038

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
vectorizer saw 2 scalar loads + 2 bit_ops + 2 scalar stores vs 1 unaligned_load
+ 1 bit_op + 1 unaligned_store, only scale cost of bit_op doesn't help.

In rtl level, we have

 205(note 3 14 4 2 NOTE_INSN_DELETED)
 206(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
 207(insn 7 4 8 2 (set (reg:V2QI 87 [ vect__20.19 ])
 208        (mem:V2QI (reg:DI 91) [0 MEM <const vector(2) unsigned char>
[(const uint8_t *)b_11(D)]+0 S2 A8])) "test.c":31:1 1414 {*movv2qi_internal}
 209     (expr_list:REG_DEAD (reg:DI 91)
 210        (nil)))
 211(insn 8 7 9 2 (set (reg:V2QI 88 [ vect__18.16 ])
 212        (mem:V2QI (reg/v/f:DI 85 [ a ]) [0 MEM <vector(2) unsigned char>
[(uint8_t *)a_10(D)]+0 S2 A8])) "test.c":31:1 1414 {*movv2qi_internal}
 213     (expr_list:REG_EQUIV (mem:V2QI (reg/v/f:DI 85 [ a ]) [0 MEM <vector(2)
unsigned char> [(uint8_t *)a_10(D)]+0 S2 A8])
 214        (nil)))
 215(insn 9 8 10 2 (parallel [
 216            (set (reg:V2QI 89 [ vect__21.20 ])
 217                (xor:V2QI (reg:V2QI 87 [ vect__20.19 ])
 218                    (reg:V2QI 88 [ vect__18.16 ])))
 219            (clobber (reg:CC 17 flags))
 220        ]) "test.c":31:1 1627 {xorv2qi3}
 221     (expr_list:REG_DEAD (reg:V2QI 88 [ vect__18.16 ])
 222        (expr_list:REG_DEAD (reg:V2QI 87 [ vect__20.19 ])
 223            (expr_list:REG_UNUSED (reg:CC 17 flags)
 224                (expr_list:REG_EQUIV (mem:V2QI (reg/v/f:DI 85 [ a ]) [0 MEM
<vector(2) unsigned char> [(uint8_t *)a_10(D)]+0 S2 A8])
 225                    (nil))))))
 226(insn 10 9 0 2 (set (mem:V2QI (reg/v/f:DI 85 [ a ]) [0 MEM <vector(2)
unsigned char> [(uint8_t *)a_10(D)]+0 S2 A8])
 227        (reg:V2QI 89 [ vect__21.20 ])) "test.c":31:1 1414
{*movv2qi_internal}
 228     (expr_list:REG_DEAD (reg:V2QI 89 [ vect__21.20 ])

if RA can allocate 87/88/89 into GPRs, it would same as non-vectorized version.

Reply via email to