On Wed, 5 Nov 2025, Christopher Bazley wrote:
>
> On 28/10/2025 13:51, Richard Biener wrote:
> > On Tue, 28 Oct 2025, Christopher Bazley wrote:
> >
> >> vect_create_constant_vectors is updated to pad with zeros
> >> between the end of a group and the end of a vector of the type
> >> chosen for the SLP node, when used for BB SLP. This function
> >> calls gimple_build_vector, which also has to be updated for
> >> SVE vector types (by using the lower bound as the number of
> >> elements, e.g., 16 for VNx16QI).
> >> ---
> >> gcc/gimple-fold.cc | 2 +-
> >> gcc/tree-vect-slp.cc | 43 +++++++++++++++++++++++++++++++++++--------
> >> 2 files changed, 36 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> >> index edcc04adc08..e5fe0ea12a7 100644
> >> --- a/gcc/gimple-fold.cc
> >> +++ b/gcc/gimple-fold.cc
> >> @@ -11275,7 +11275,7 @@ gimple_build_vector (gimple_stmt_iterator *gsi,
> >> {
> >> gimple_seq seq = NULL;
> >> tree type = builder->type ();
> >> - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >> + unsigned int nelts = constant_lower_bound (TYPE_VECTOR_SUBPARTS
> >> (type));
> > I don't think this is desirable in a generic helper? How does
> > the 'builder' class deal with the all-constant case? It seems
> > handling for constant vs. non-constant will now differ semantically
> > (instead of ICEing in one case previously).
>
> This was the most minimal change I could make to get the feature working
> (whilst debugging many other issues) and it seemed harmless to me, so I didn't
> spend much time thinking about it.
>
> I know very little about the builder, but my understanding is that it would
> behave as though elements beyond the lower bound do not exist. e.g., if the
> vector type is VNx16QI then TREE_CONSTANT would return true for the
> CONSTRUCTOR node created by build_constructor if elements 0..15 are constant.
>
> This is presumably not safe general-purpose behaviour, because it would leave
> any other elements uninitialised (which does not matter for my use-case). I
> have no objection to trying to solve this elsewhere (probably in
> vect_create_constant_vectors) but I'll first need to revert this change and
> remind myself what breaks.
Fixing this upthread would be definitely better. Not sure exactly how.
Alternatively the change could be done in a way to assert that
the tree_vector_builder has less than or exactly the same number
of elements as constant_lower_bound of nelts. I don't exactly
remember what the builder tracks here and what constraints for
initialization of VLA vectors are.
> >> vec<constructor_elt, va_gc> *v;
> >> vec_alloc (v, nelts);
> >> for (i = 0; i < nelts; ++i)
> >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> >> index 8b66751a8a9..b0c4c05a447 100644
> >> --- a/gcc/tree-vect-slp.cc
> >> +++ b/gcc/tree-vect-slp.cc
> >> @@ -10628,7 +10628,7 @@ vect_create_constant_vectors (vec_info *vinfo,
> >> slp_tree op_node)
> >> unsigned j, number_of_places_left_in_vector;
> >> tree vector_type;
> >> tree vop;
> >> - int group_size = op_node->ops.length ();
> >> + unsigned int group_size = op_node->ops.length ();
> >> unsigned int vec_num, i;
> >> unsigned number_of_copies = 1;
> >> bool constant_p;
> >> @@ -10639,6 +10639,11 @@ vect_create_constant_vectors (vec_info *vinfo,
> >> slp_tree op_node)
> >> vector_type = SLP_TREE_VECTYPE (op_node);
> >>
> >> unsigned int number_of_vectors = vect_get_num_copies (vinfo, op_node);
> >> + if (dump_enabled_p ())
> >> + dump_printf_loc (
> >> + MSG_NOTE, vect_location,
> >> + "Allocating %u constant and loop invariant defs in node %p\n",
> >> + number_of_vectors, (void *) op_node);
> > Please avoid sprinkling new debug stuff everywhere (you can keep
> > that separate in your tree).
> I mostly did. This must have escaped. I did spend many hours splitting my
> change though.
> >> SLP_TREE_VEC_DEFS (op_node).create (number_of_vectors);
> >> auto_vec<tree> voprnds (number_of_vectors);
> >> @@ -10658,12 +10663,24 @@ vect_create_constant_vectors (vec_info *vinfo,
> >> slp_tree op_node)
> >> (s1, s2, ..., s8). We will create two vectors {s1, s2, s3, s4} and
> >> {s5, s6, s7, s8}. */
> >> - /* When using duplicate_and_interleave, we just need one element for
> >> - each scalar statement. */
> >> - if (!TYPE_VECTOR_SUBPARTS (vector_type).is_constant (&nunits))
> >> - nunits = group_size;
> >> + unsigned int elt_count = group_size;
> >> + if (is_a<bb_vec_info> (vinfo))
> >> + {
> >> + /* We don't use duplicate_and_interleave for basic block
> >> vectorization.
> >> + We know that the group size fits within a single vector, so all we
> >> need
> >> + to do for VLA is to pad the constant to the minimum vector length. */
> > It would be much easier if the backend would let us use the fixed
> > (minimum) size SVE vectors, no? Because that's what we are actually
> > enforcing anyway and those modes exist? (they do for RVV for sure)
>
> The backend does not provide fixed minimum size SVE vectors. When it has been
> configured to use a specific vector length (e.g., using
> -msve-vector-bits=128), the vectoriser cannot use the fixed-length type V16QI
> interchangeably with the equivalent scalable type VNx16QI -- even though they
> can encode the same amount of data. Predicate masks aren't supported by the
> backend for V16QI because Advanced SIMD does not support masks, therefore code
> can be vectorised using VNx16QI that cannot be vectorised using V16QI.
>
> Tamar might be able to explain this part of the architecture better than me.
Yes, I think we discussed this on IRC and I understand that limitation
now.
Richard.
> >> + nunits = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vector_type));
> >> + elt_count = MAX (nunits, group_size);
> >> + }
> >> + else
> >> + {
> >> + /* When using duplicate_and_interleave, we just need one element for
> >> + each scalar statement. */
> >> + if (!TYPE_VECTOR_SUBPARTS (vector_type).is_constant (&nunits))
> >> + nunits = group_size;
> >> + }
> >> - number_of_copies = nunits * number_of_vectors / group_size;
> >> + number_of_copies = nunits * number_of_vectors / elt_count;
> >>
> >> number_of_places_left_in_vector = nunits;
> >> constant_p = true;
> >> @@ -10673,9 +10690,15 @@ vect_create_constant_vectors (vec_info *vinfo,
> >> slp_tree op_node)
> >> stmt_vec_info insert_after = NULL;
> >> for (j = 0; j < number_of_copies; j++)
> >> {
> >> - tree op;
> >> - for (i = group_size - 1; op_node->ops.iterate (i, &op); i--)
> >> + for (i = elt_count; i-- > 0;)
> >> {
> >> + tree op;
> >> + if (i < group_size)
> >> + op = op_node->ops[i];
> >> + else
> >> + /* Pad with zeros. */
> >> + op = build_zero_cst (TREE_TYPE (vector_type));
> >> +
> >> /* Create 'vect_ = {op0,op1,...,opn}'. */
> >> tree orig_op = op;
> >> if (number_of_places_left_in_vector == nunits)
> >> @@ -10761,6 +10784,10 @@ vect_create_constant_vectors (vec_info *vinfo,
> >> slp_tree op_node)
> >> ? multiple_p (type_nunits, nunits)
> >> : known_eq (type_nunits, nunits))
> >> vec_cst = gimple_build_vector (&ctor_seq, &elts);
> >> + else if (is_a<bb_vec_info> (vinfo))
> >> + {
> >> + vec_cst = gimple_build_vector (&ctor_seq, &elts);
> >> + }
> >> else
> >> {
> >> if (permute_results.is_empty ())
> >>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)