[Bug target/118480] Power9 target generates poor code for vector char splat immediate.

jeevitha at gcc dot gnu.org via Gcc-bugs Thu, 31 Jul 2025 07:47:48 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118480


--- Comment #7 from Jeevitha <jeevitha at gcc dot gnu.org> ---
Now, in the following case, we run into issues whenever we use a splat constant
with vec_slo or vec_sll:

vui128_t
test_slqi_char_15_V1 (vui128_t vra)
{
  vui8_t result;
  vui8_t tmp = vec_splats((unsigned char)15);
  result = vec_slo ((vui8_t)vra, tmp);
  return (vui128_t) vec_vsl (result, tmp);
}

Generated assembly:

test_slqi_char_15_V1:
    vspltisb 0,15
    vslo 2,2,0
    vsl 2,2,0
    blr

If you look at the final RTL, you’ll notice that the vspltisb has mode V4SI
instead of  V16QI for a char splat:

(insn:TI 6 9 7 (set (reg:V4SI 64 %v0 [122])
      (const_vector:V4SI [
          (const_int 252645135 [0xf0f0f0f]) repeated x4
      ])) "t4.c":14:12 1194 {vsx_movv4si_64bit}
   (expr_list:REG_EQUIV (const_vector:V4SI [
          (const_int 252645135 [0xf0f0f0f]) repeated x4
      ])
      (nil)))
vspltisb %v0,15   # 6  [c=20 l=20]  vsx_movv4si_64bit/15

This happens because vec_slo/vec_sll internally expect operands of type vsi
(vector signed int). So before calling __builtin_altivec_vslo, the tmp is
implicitly converted to an integer vector type.

Here's the Gimple data:


vui8_t tmp;
vui32_t result;
vector(4) int _1;
vector(4) int _2;
vector(4) int _3;
vector(4) int _4;
vector(4) int _5;
vector(4) int _6;
vui128_t _10;

<bb 2> :
tmp_7 = { 15, 15, ..., 15 };  // 16 elements
_1 = VIEW_CONVERT_EXPR<__vector signed int>(vra_8(D));
_2 = VIEW_CONVERT_EXPR<__vector signed int>(tmp_7);
_3 = __builtin_altivec_vslo(_1, _2);
result_9 = VIEW_CONVERT_EXPR<vui32_t>(_3);
_4 = VIEW_CONVERT_EXPR<__vector signed int>(result_9);
_5 = VIEW_CONVERT_EXPR<__vector signed int>(tmp_7);
_6 = __builtin_altivec_vsl(_4, _5);
_10 = VIEW_CONVERT_EXPR<vui128_t>(_6);
return _10;

This behavior occurs due to how we define the built-in function?. The
definition for __builtin_altivec_vslo in rs6000-builtins.def is:

const vsi __builtin_altivec_vslo (vsi, vsi);
  VSLO altivec_vslo {}

And the corresponding instruction definition in altivec.md:

(define_insn "altivec_vslo"
  [(set (match_operand:V4SI 0 "register_operand" "=v")
        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
                      (match_operand:V4SI 2 "register_operand" "v")]
                     UNSPEC_VSLO))]
  "TARGET_ALTIVEC"
  "vslo %0,%1,%2"
  [(set_attr "type" "vecperm")])

Although there are overloaded built-in forms that allow vec_slo to accept
vui8_t or other types:

vsc __builtin_vec_slo(vsc, vsc);   // VSLO_VSCS
vsc __builtin_vec_slo(vsc, vuc);   // VSLO_VSCU
...
vsi __builtin_vec_slo(vsi, vsc);   // VSLO_VSIS
vui __builtin_vec_slo(vui, vuc);   // VSLO_VUIU
...

Points to note:

Although we have many vector types, why are we defaulting to V4SI?

Even though it's being lowered to V4SI, how is vspltis* getting generated? I
found that in output_vec_const_move, we have special handling for this—but not
for xxsplti*.

[Bug target/118480] Power9 target generates poor code for vector char splat immediate.

Reply via email to