Hi Richard,

Thanks for reviewing my patch again.

On 09/06/2026 10:33, Richard Biener wrote:
On Wed, Jun 3, 2026 at 5:20 PM Christopher Bazley <[email protected]> wrote:

Created a new function, gimple_build_vector_from_elems,
for use when creating vectorized definitions for basic block
vectorization in vect_create_constant_vectors.

The existing gimple_build_vector function cannot be used
for SVE vector types because it relies on the type
associated with the tree_vector_builder having a constant
number of subparts. Even if that limitation were lifted, the
possibility of tree_vector_builder patterns being used is
inappropriate.

The new function takes a vector type and vec of tree nodes
giving the element values to put into the built vector, instead of an
instance of tree_vector_builder. If the number of values is zero then
a zero constant is built. If all values are constant then a vector
constant is built. Otherwise, a new constructor node is created.

gcc/ChangeLog:

         * gimple-fold.cc (gimple_build_vector_from_elems): Define a
         new function to build a vector from a list of elements that need
         not be complete.
         * gimple-fold.h (gimple_build_vector_from_elems): Declare a new
         function and a simpler overloaded version with fewer parameters.
         * tree-vect-slp.cc (vect_create_constant_vectors):
         Use gimple_build_vector_from_elems instead of
         duplicate_and_interleave to create non-uniform constant
         vectors for BB SLP vectorization.
---
  gcc/gimple-fold.cc   | 55 ++++++++++++++++++++++++++++++++++++++++++++
  gcc/gimple-fold.h    | 14 +++++++++++
  gcc/tree-vect-slp.cc | 40 +++++++++++++++++++++++++-------
  3 files changed, 101 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 1ceb5aa5fba..3462c5acb6e 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -11425,6 +11425,61 @@ gimple_build_vector (gimple_stmt_iterator *gsi,
    return builder->build ();
  }

+/* Build a vector of type VECTYPE from a partial list of ELTS, handling the 
case
+   in which some elements are non-constant.  The list of values may be shorter
+   than the minimum number of subparts implied by VECTYPE. (When the vector
+   type is variable-length, the actual number of subparts may not be known.)
+   Omitted elements are implicitly zero.
+
+   Return a gimple value for the result, inserting any new instructions
+   to GSI honoring BEFORE and UPDATE.  */
+
+tree
+gimple_build_vector_from_elems (gimple_stmt_iterator *gsi, bool before,
+                               gsi_iterator_update update, location_t loc,
+                               tree vectype, const vec<tree> &elts)

The name is not distinctive enough to answer why it's used over
gimple_build_vector.

Sorry about that. I agree that the name is not distinctive enough. The original name of this function was gimple_build_vector_with_zero_padding. I created it because of Richard Sandiford's email here:
https://inbox.sourceware.org/gcc-patches/[email protected]/

Specifically, "..add a new gimple_build interface that explicitly fills with zeros, using a normal array (instead of a tree_vector_builder) for the explicitly-initialised element."

I later renamed the function as gimple_build_vector_from_elems in v4 as a result of your subsequent email:
https://inbox.sourceware.org/gcc-patches/[email protected]/

Specifically, "...In GIMPLE a CONSTRUCTOR node has not mentioned elements zero-filled auto-magically. So iff you assume that the target can create a VLA vector with a n-element prefix (with n <= lower_bound (nunits)) then you shouldn't need to do anything special."

By that point, I had ran out of inventive names.

Richard Sandiford wrote "If you want to do something different for BB SLP then I think it makes sense that there is some difference in the way that the constant is constructed."

Do we want to do something different for BB SLP? I was happy enough with the original version, which did use gimple_build_vector:
https://inbox.sourceware.org/gcc-patches/[email protected]/

That original version padded the vector of elements with zeros but had nelts_per_pattern == 1, therefore I suppose that the patterns were repeated in the upper lanes of each SVE register. Based on Richard Sandiford's email, he does not consider any situation to be valid other than npatterns == nelts_per_pattern == encoded_nelts == 1, or nelts_per_pattern == 2 and multiple_p (TYPE_VECTOR_SUBPARTS (type), npatterns). Since the latest version of my patch uses build_vector_from_ctor, it satisfies those constraints.

The only practical thing that my gimple_build_vector_from_elems function does differently from gimple_build_vector for non-constant vectors is that it does not rely on the vector type having a fixed length: instead, it relies on implicit zero-filling of not-mentioned elements of a CONSTRUCTOR node. Is that behaviour you would be willing to adopt in gimple_build_vector?

There is a greater practical difference in the encoding of VECTOR_CST:

The tree_vector_builder in vect_create_constant_vectors has
one element per pattern, therefore encoded_nelts is simply the number of patterns (i.e. element values) in gimple_build_vector. If the vector is constant then builder->build () uses that encoding. However, builder->build () requires the number of patterns to be an integral power of 2, which may not be true for the number of element values supplied by BB SLP.

encoded_nelts is also the number of element values in gimple_build_vector_from_elems, but here build_vector_from_ctor sets step to 2 for VLA vector types (e.g., encoding {1, 0}, {2, 0}, {3, 0}, {0, 0} if a VLA type with lower bound 4 existed) instead of step 1, which is a different encoding from that used for the tree_vector_builder in vect_create_constant_vectors (e.g., {1}, {2}, {3}). The latter would be invalid without zero-padding as in my first patch version because the number of patterns might not be an integral power of 2.

If a step-1 encoding can be valid for a VECTOR_CST of VLA type, then it seems it would be more compact than a step-2 encoding. Use of a step-2 encoding does not appear to ensure that the number of patterns is acceptable to builder->build(), nor does it avoid the need to pad the vector with explicit zeros (on the contrary, it requires it). My current understanding is that the purpose of the step-2 encoding is to make explicit that the scalable tail is zero-filled rather than repeating the minimum prefix, but there might be another reason I am not aware of.

I did not want to change that encoding because it is already the usual encoding for a VECTOR_CST of VLA type, but I wonder how important it is to encode this lack of repetition.

In particular ...

+{
+  unsigned int encoded_nelts = elts.length ();
+  gimple_seq seq = NULL;
+  gcc_assert (TREE_CODE (vectype) == VECTOR_TYPE);
+  unsigned int lower_bound
+    = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vectype));
+  gcc_assert (encoded_nelts <= lower_bound);
+
+  if (encoded_nelts == 0)
+    return build_zero_cst (vectype);
+
+  /* Prepare a vector of constructor elements and find out whether all
+     of the element values are constant.  */
+  vec<constructor_elt, va_gc> *v;
+  vec_alloc (v, encoded_nelts);
+  bool is_constant = true;
+
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    {
+      if (!CONSTANT_CLASS_P (elts[i]))
+       is_constant = false;
+
+      CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, elts[i]);
+    }
+
+  /* If all element values are constant then we can return a new VECTOR_CST
+     node.  Any elements for which no value is supplied will be zero.  */
+  if (is_constant)
+    return build_vector_from_ctor (vectype, v);

... this case will exactly use a vector builder again and thus likely ICE for
cases that cannot be handled by can_duplicate_and_interleave_p.

That does make the current wording of my commit message misleading. Thanks for pointing that out. I cannot claim that the "possibility of tree_vector_builder patterns being used is inappropriate" and then use them anyway.

I have not yet managed to find any cases in which build_vector_from_ctor would fail (for example by compiling source files that do vector addition with randomly-chosen addends).

build_vector_from_ctor chooses step = 2, nelts = 16 (for VNx16QI).

It pushes as many non-zero values as vect_create_constant_vectors supplied via gimple_build_vector_from_elems (e.g., 15 values: {13, 2, 156, 212, 1 ,234, 112, 21, 1, 24, 241, 32, 81, 92, 21}).

It then pushes enough zero constants to make up the difference to twice the lower bound of the number of elements in the VLA type (e.g. 2 * 16 - 15 = 17).

It then calls tree_vector_builder::build(). The assertion in that function passes for the VLA types used here because the lower bound of the number of elements in a VLA type is always an integral power of two.

make_vector() requires 1 <= nelts_per_pattern <= 3, which is true because that's the step (2). I don't see where else build_vector_from_ctor could fail.

Which raises the question - we agreed on how to handle VLA vector
CONSTRUCTORs, but the VLA VECTOR_CST representation does not
have sth equivalent here?

In this patch, the equivalent text is simply "Omitted elements are implicitly zero" in the description of gimple_build_vector_from_elems.

I previously updated the description of the pattern vec_init in md.texi:
https://forge.sourceware.org/gcc/gcc-TEST/pulls/177/files

"If @var{m} specifies a scalable vector mode, then operand 1 only specifies the minimum number of elements implied by @var{m} and elements beyond are zero initialized."

I also previously added this to the description of the store_constructor function:

"If the constructor EXP has a vector type then elements of TARGET for which there is no corresponding element in EXP are zero'd. For a variable-length vector type, only elements up to the minimum number of subparts of the type are explicitly zero'd; any elements beyond that are implicitly zero."


Maybe something similar should be added to the description of make_vector(), since that seems to be the sole origin of VECTOR_CST.

The description of CONSTRUCTOR in gcc/doc/generic.texi already says "You should not assume that all fields will be represented. Unrepresented fields will be cleared (zeroed), unless the CONSTRUCTOR_NO_CLEARING flag is set, in which case their value becomes undefined."

What I propose is to extend the description of @item VECTOR_CST in
gcc/doc/generic.texi. Something like this:

"Only the minimum number of elements required for a scalable vector constant need be represented.

Unrepresented elements of a scalable vector constant will be cleared (zeroed)."

As an aside, why does GENERIC require vector constants to be fully specified? What value does explicit padding up to the minimum vector length actually have?

As for naming I'd prefer sth like

gimple_build_forced_constant_size_vector ()

How about gimple_build_variable_length_vector, as this function is only for use with VLA types? Or gimple_build_vector_from_partial?

or something similar.  Not sure why we cannot use a tree_vector_builder
here, possibly even can get a special force-constant-size mode in it
we can just switch on?

I think we can use a tree_vector_builder. I don't think this path needs to handle arbitrary vector constants -- it just needs to handle the kind of the vector constants that the vectoriser can produce (which are already limited to be no longer than the lower bound of the number of lanes in a variable length vector).

+
+  tree res;
+  if (gimple_in_ssa_p (cfun))
+    res = make_ssa_name (vectype);
+  else
+    res = create_tmp_reg (vectype);
+  gimple *stmt = gimple_build_assign (res, build_constructor (vectype, v));
+  gimple_set_location (stmt, loc);
+  gimple_seq_add_stmt_without_update (&seq, stmt);
+  gimple_build_insert_seq (gsi, before, update, seq);
+  return res;
+}
+
  /* Emit gimple statements into &stmts that take a value given in OLD_SIZE
     and generate a value guaranteed to be rounded upwards to ALIGN.

diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index f1853560779..8b324be005a 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -243,6 +243,20 @@ gimple_build_vector (gimple_seq *seq, tree_vector_builder 
*builder)
                               UNKNOWN_LOCATION, builder);
  }

+extern tree gimple_build_vector_from_elems (gimple_stmt_iterator *, bool,
+                                           enum gsi_iterator_update,
+                                           location_t, tree vectype,
+                                           const vec<tree> &);
+
+inline tree
+gimple_build_vector_from_elems (gimple_seq *seq, tree vectype,
+                               const vec<tree> &elts)
+{
+  gimple_stmt_iterator gsi = gsi_last (*seq);
+  return gimple_build_vector_from_elems (&gsi, false, GSI_CONTINUE_LINKING,
+                                        UNKNOWN_LOCATION, vectype, elts);
+}
+
  extern tree gimple_build_round_up (gimple_stmt_iterator *, bool,
                                    enum gsi_iterator_update,
                                    location_t, tree, tree,
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4dd7e6e1e21..f91d3e723ec 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10727,7 +10727,7 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree 
op_node)
    unsigned j, number_of_places_left_in_vector;
    tree vector_type;
    tree vop;
-  int group_size = op_node->ops.length ();
+  unsigned int group_size = op_node->ops.length ();
    unsigned int vec_num, i;
    unsigned number_of_copies = 1;
    bool constant_p;
@@ -10757,10 +10757,23 @@ vect_create_constant_vectors (vec_info *vinfo, 
slp_tree op_node)
       (s1, s2, ..., s8).  We will create two vectors {s1, s2, s3, s4} and
       {s5, s6, s7, s8}.  */

-  /* When using duplicate_and_interleave, we just need one element for
-     each scalar statement.  */
-  if (!TYPE_VECTOR_SUBPARTS (vector_type).is_constant (&nunits))
-    nunits = group_size;
+  if (is_a<bb_vec_info> (vinfo))
+    {
+      /* We don't use duplicate_and_interleave for basic block vectorization.
+        We know that either the group size is exactly divisible by the vector
+        length or it fits within a single vector.  */
+      nunits = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vector_type));
+      gcc_checking_assert (multiple_p (group_size, nunits)
+                          || known_le (group_size, nunits));
+      nunits = MIN (nunits, group_size);
+    }
+  else
+    {
+      /* When using duplicate_and_interleave, we just need one element for
+        each scalar statement.  */
+      if (!TYPE_VECTOR_SUBPARTS (vector_type).is_constant (&nunits))
+       nunits = group_size;
+    }

    number_of_copies = nunits * number_of_vectors / group_size;

@@ -10860,6 +10873,11 @@ vect_create_constant_vectors (vec_info *vinfo, 
slp_tree op_node)
                        ? multiple_p (type_nunits, nunits)
                        : known_eq (type_nunits, nunits))
                 vec_cst = gimple_build_vector (&ctor_seq, &elts);
+             else if (is_a<bb_vec_info> (vinfo))
+               {
+                 vec_cst = gimple_build_vector_from_elems (&ctor_seq,
+                                                           elts.type (), elts);
+               }
               else
                 {
                   if (permute_results.is_empty ())
@@ -10925,9 +10943,15 @@ vect_create_constant_vectors (vec_info *vinfo, 
slp_tree op_node)
       NUMBER_OF_SCALARS/NUNITS or NUNITS/NUMBER_OF_SCALARS, and hence we have
       to replicate the vectors.  */
    while (number_of_vectors > SLP_TREE_VEC_DEFS (op_node).length ())
-    for (i = 0; SLP_TREE_VEC_DEFS (op_node).iterate (i, &vop) && i < vec_num;
-        i++)
-      SLP_TREE_VEC_DEFS (op_node).quick_push (vop);
+    {
+      /* Guard against the outer loop never terminating because the
+        inner loop is never entered.  */
+      gcc_checking_assert (vec_num > 0);
+
+      for (i = 0; SLP_TREE_VEC_DEFS (op_node).iterate (i, &vop) && i < vec_num;
+          i++)
+       SLP_TREE_VEC_DEFS (op_node).quick_push (vop);
+    }
  }

  /* Get the scalar definition of the Nth lane from SLP_NODE or NULL_TREE
--
2.43.0



--
Christopher Bazley
Staff Software Engineer, GNU Tools Team.
Arm Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
http://www.arm.com/

Reply via email to