On 07/11/2025 13:35, Richard Biener wrote:
On Wed, 5 Nov 2025, Christopher Bazley wrote:

On 28/10/2025 13:51, Richard Biener wrote:
On Tue, 28 Oct 2025, Christopher Bazley wrote:

vect_create_constant_vectors is updated to pad with zeros
between the end of a group and the end of a vector of the type
chosen for the SLP node, when used for BB SLP. This function
calls gimple_build_vector, which also has to be updated for
SVE vector types (by using the lower bound as the number of
elements, e.g., 16 for VNx16QI).
---
   gcc/gimple-fold.cc   |  2 +-
   gcc/tree-vect-slp.cc | 43 +++++++++++++++++++++++++++++++++++--------
   2 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index edcc04adc08..e5fe0ea12a7 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -11275,7 +11275,7 @@ gimple_build_vector (gimple_stmt_iterator *gsi,
          {
    gimple_seq seq = NULL;
    tree type = builder->type ();
-       unsigned int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+       unsigned int nelts = constant_lower_bound (TYPE_VECTOR_SUBPARTS
(type));
I don't think this is desirable in a generic helper?  How does
the 'builder' class deal with the all-constant case?  It seems
handling for constant vs. non-constant will now differ semantically
(instead of ICEing in one case previously).
This was the most minimal change I could make to get the feature working
(whilst debugging many other issues) and it seemed harmless to me, so I didn't
spend much time thinking about it.

I know very little about the builder, but my understanding is that it would
behave as though elements beyond the lower bound do not exist. e.g., if the
vector type is VNx16QI then TREE_CONSTANT would return true for the
CONSTRUCTOR node created by build_constructor if elements 0..15 are constant.

This is presumably not safe general-purpose behaviour, because it would leave
any other elements uninitialised (which does not matter for my use-case). I
have no objection to trying to solve this elsewhere (probably in
vect_create_constant_vectors) but I'll first need to revert this change and
remind myself what breaks.
Fixing this upthread would be definitely better.  Not sure exactly how.
Alternatively the change could be done in a way to assert that
the tree_vector_builder has less than or exactly the same number
of elements as constant_lower_bound of nelts.  I don't exactly
remember what the builder tracks here and what constraints for
initialization of VLA vectors are.

I've done some further investigation.

One of the tests that failed without my change to gimple_build_vector was gcc.target/aarch64/sve/slp_6.c. I made that change to enable building of the following constant (among others):

_70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

That constant has only 16 elements although the type of _70 is vector([16,16]) unsigned char:

void vec_slp_int8_t (int8_t * restrict a, int8_t * restrict b, int n)
{
...

  vector([16,16]) signed char vect_x0_43.58;
  vector([16,16]) signed char vect__90.57;
  vector([16,16]) unsigned char vect__89.56;
  vector([16,16]) unsigned char vect__87.55;
  vector([16,16]) signed char vect_x0_26.54;
  vector([16,16]) signed char vect_x0_34.47;

...

  vector([16,16]) signed char vect_x1_35.41;

...

  vector([16,16]) signed char vect_x2_36.35;

...

  void * _8;
  vector([16,16]) signed char[3] * _9;

...

  unsigned char _21;
  vector([16,16]) unsigned char _22;
  unsigned char _55;
  vector([16,16]) unsigned char _56;

...

  vector([16,16]) unsigned char _70;
  vector([16,16]) unsigned char _84;
  unsigned char _85;

...

  <bb 5> [local count: 105119324]:
  _84 = (vector([16,16]) unsigned char) vect_x0_34.47_82;
  _85 = .REDUC_PLUS (_84);
  _22 = (vector([16,16]) unsigned char) vect_x1_35.41_38;
  _21 = .REDUC_PLUS (_22);
  _56 = (vector([16,16]) unsigned char) vect_x2_36.35_58;
  _55 = .REDUC_PLUS (_56);
  _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
  vect__89.56_3 = vect__87.55_2 + _70;
  vect__90.57_4 = VIEW_CONVERT_EXPR<vector([16,16]) signed char>(vect__89.56_3);

  <bb 6> [local count: 118111600]:
  # vect_x0_43.58_5 = PHI <vect__90.57_4(5), vect_x0_26.54_1(8)>
  .MASK_STORE (b_25(D), 8B, { -1, -1, -1, 0, 0, 0, 0, 0, ... }, vect_x0_43.58_5); [tail call]
  return;

The compiled code looks correct, although movi d0,#0 only zeros the first 16 bytes of the variable-length vector constant:

    addvl    x0, sp, #2
    movi    d0, #0
    st1b    z0.b, p6, [sp, #2, mul vl]
    uaddv    d27, p6, z27.b
    uaddv    d26, p6, z26.b
    uaddv    d25, p6, z25.b
    str    b27, [x0]
    addvl    x0, sp, #1
    add    x0, x0, 1
    ptrue    p7.b, vl3
    ld1b    z0.b, p6/z, [sp, #2, mul vl]
    st1b    z0.b, p6, [sp, #1, mul vl]
    str    b26, [x0]
    ld1b    z0.b, p6/z, [sp, #1, mul vl]
    st1b    z0.b, p6, [sp]
    str    b25, [sp, 2]
    ld1b    z0.b, p6/z, [sp]
    add    z28.b, z0.b, z28.b
    st1b    z28.b, p7, [x1]
    addvl    sp, sp, #3
    .cfi_def_cfa_offset 0
    ret

(This code has already been noted to be inefficient, which I plan to address separately.)

The decision about how many bytes to zero is made in the calling function, vect_create_constant_vectors (which also uses constant_lower_bound), rather than in gimple_build_vector:

  unsigned int elt_count = group_size;
  if (is_a<bb_vec_info> (vinfo))
    {
      /* We don't use duplicate_and_interleave for basic block vectorization.
    We know that the group size fits within a single vector, so all we need
    to do for VLA is to pad the constant to the minimum vector length.  */
      nunits = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vector_type));
      elt_count = MAX (nunits, group_size);
    }

My current understanding is that you don't object to this part of my change. Whatever happens in gimple_build_vector won’t alter the fact that only the minimum number of bytes are zeroed, and in most cases that’s the desirable outcome.

I therefore plan to keep my modification to gimple_build_vector, but add an assertion that builder->encoded_nelts () <= constant_lower_bound (TYPE_VECTOR_SUBPARTS (builder->type ())) so that the modified function never builds fewer elements than expected when one of them is non-constant. Would that be OK?

--
Christopher Bazley
Staff Software Engineer, GNU Tools Team.
Arm Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
http://www.arm.com/

Reply via email to