On Fri, 7 Nov 2025, Christopher Bazley wrote:
>
> On 07/11/2025 13:35, Richard Biener wrote:
> > On Wed, 5 Nov 2025, Christopher Bazley wrote:
> >
> >> On 28/10/2025 13:51, Richard Biener wrote:
> >>> On Tue, 28 Oct 2025, Christopher Bazley wrote:
> >>>
> >>>> vect_create_constant_vectors is updated to pad with zeros
> >>>> between the end of a group and the end of a vector of the type
> >>>> chosen for the SLP node, when used for BB SLP. This function
> >>>> calls gimple_build_vector, which also has to be updated for
> >>>> SVE vector types (by using the lower bound as the number of
> >>>> elements, e.g., 16 for VNx16QI).
> >>>> ---
> >>>> gcc/gimple-fold.cc | 2 +-
> >>>> gcc/tree-vect-slp.cc | 43 +++++++++++++++++++++++++++++++++++--------
> >>>> 2 files changed, 36 insertions(+), 9 deletions(-)
> >>>>
> >>>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> >>>> index edcc04adc08..e5fe0ea12a7 100644
> >>>> --- a/gcc/gimple-fold.cc
> >>>> +++ b/gcc/gimple-fold.cc
> >>>> @@ -11275,7 +11275,7 @@ gimple_build_vector (gimple_stmt_iterator *gsi,
> >>>> {
> >>>> gimple_seq seq = NULL;
> >>>> tree type = builder->type ();
> >>>> - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >>>> + unsigned int nelts = constant_lower_bound (TYPE_VECTOR_SUBPARTS
> >>>> (type));
> >>> I don't think this is desirable in a generic helper? How does
> >>> the 'builder' class deal with the all-constant case? It seems
> >>> handling for constant vs. non-constant will now differ semantically
> >>> (instead of ICEing in one case previously).
> >> This was the most minimal change I could make to get the feature working
> >> (whilst debugging many other issues) and it seemed harmless to me, so I
> >> didn't
> >> spend much time thinking about it.
> >>
> >> I know very little about the builder, but my understanding is that it would
> >> behave as though elements beyond the lower bound do not exist. e.g., if the
> >> vector type is VNx16QI then TREE_CONSTANT would return true for the
> >> CONSTRUCTOR node created by build_constructor if elements 0..15 are
> >> constant.
> >>
> >> This is presumably not safe general-purpose behaviour, because it would
> >> leave
> >> any other elements uninitialised (which does not matter for my use-case). I
> >> have no objection to trying to solve this elsewhere (probably in
> >> vect_create_constant_vectors) but I'll first need to revert this change and
> >> remind myself what breaks.
> > Fixing this upthread would be definitely better. Not sure exactly how.
> > Alternatively the change could be done in a way to assert that
> > the tree_vector_builder has less than or exactly the same number
> > of elements as constant_lower_bound of nelts. I don't exactly
> > remember what the builder tracks here and what constraints for
> > initialization of VLA vectors are.
>
> I've done some further investigation.
>
> One of the tests that failed without my change to gimple_build_vector was
> gcc.target/aarch64/sve/slp_6.c. I made that change to enable building of the
> following constant (among others):
>
> _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
>
> That constant has only 16 elements although the type of _70 is vector([16,16])
> unsigned char:
>
> void vec_slp_int8_t (int8_t * restrict a, int8_t * restrict b, int n)
> {
> ...
>
> vector([16,16]) signed char vect_x0_43.58;
> vector([16,16]) signed char vect__90.57;
> vector([16,16]) unsigned char vect__89.56;
> vector([16,16]) unsigned char vect__87.55;
> vector([16,16]) signed char vect_x0_26.54;
> vector([16,16]) signed char vect_x0_34.47;
>
> ...
>
> vector([16,16]) signed char vect_x1_35.41;
>
> ...
>
> vector([16,16]) signed char vect_x2_36.35;
>
> ...
>
> void * _8;
> vector([16,16]) signed char[3] * _9;
>
> ...
>
> unsigned char _21;
> vector([16,16]) unsigned char _22;
> unsigned char _55;
> vector([16,16]) unsigned char _56;
>
> ...
>
> vector([16,16]) unsigned char _70;
> vector([16,16]) unsigned char _84;
> unsigned char _85;
>
> ...
>
> <bb 5> [local count: 105119324]:
> _84 = (vector([16,16]) unsigned char) vect_x0_34.47_82;
> _85 = .REDUC_PLUS (_84);
> _22 = (vector([16,16]) unsigned char) vect_x1_35.41_38;
> _21 = .REDUC_PLUS (_22);
> _56 = (vector([16,16]) unsigned char) vect_x2_36.35_58;
> _55 = .REDUC_PLUS (_56);
> _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> vect__89.56_3 = vect__87.55_2 + _70;
> vect__90.57_4 = VIEW_CONVERT_EXPR<vector([16,16]) signed
> char>(vect__89.56_3);
>
> <bb 6> [local count: 118111600]:
> # vect_x0_43.58_5 = PHI <vect__90.57_4(5), vect_x0_26.54_1(8)>
> .MASK_STORE (b_25(D), 8B, { -1, -1, -1, 0, 0, 0, 0, 0, ... },
> vect_x0_43.58_5); [tail call]
> return;
>
> The compiled code looks correct, although movi d0,#0 only zeros the first 16
> bytes of the variable-length vector constant:
>
> addvl x0, sp, #2
> movi d0, #0
> st1b z0.b, p6, [sp, #2, mul vl]
> uaddv d27, p6, z27.b
> uaddv d26, p6, z26.b
> uaddv d25, p6, z25.b
> str b27, [x0]
> addvl x0, sp, #1
> add x0, x0, 1
> ptrue p7.b, vl3
> ld1b z0.b, p6/z, [sp, #2, mul vl]
> st1b z0.b, p6, [sp, #1, mul vl]
> str b26, [x0]
> ld1b z0.b, p6/z, [sp, #1, mul vl]
> st1b z0.b, p6, [sp]
> str b25, [sp, 2]
> ld1b z0.b, p6/z, [sp]
> add z28.b, z0.b, z28.b
> st1b z28.b, p7, [x1]
> addvl sp, sp, #3
> .cfi_def_cfa_offset 0
> ret
>
> (This code has already been noted to be inefficient, which I plan to address
> separately.)
>
> The decision about how many bytes to zero is made in the calling function,
> vect_create_constant_vectors (which also uses constant_lower_bound), rather
> than in gimple_build_vector:
>
> unsigned int elt_count = group_size;
> if (is_a<bb_vec_info> (vinfo))
> {
> /* We don't use duplicate_and_interleave for basic block
> vectorization.
> We know that the group size fits within a single vector, so all we need
> to do for VLA is to pad the constant to the minimum vector length. */
> nunits = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vector_type));
> elt_count = MAX (nunits, group_size);
> }
>
> My current understanding is that you don't object to this part of my change.
> Whatever happens in gimple_build_vector won’t alter the fact that only the
> minimum number of bytes are zeroed, and in most cases that’s the desirable
> outcome.
>
> I therefore plan to keep my modification to gimple_build_vector, but add an
> assertion that builder->encoded_nelts () <= constant_lower_bound
> (TYPE_VECTOR_SUBPARTS (builder->type ())) so that the modified function never
> builds fewer elements than expected when one of them is non-constant. Would
> that be OK?
I'm not sure builder->encoded_nelts () is the correct thing to check
here. In particular any stepped encoding should be rejected as well,
so nelts_per_pattern () must be <= 2. And even then the interpretation
is then to fill with the last value IIRC, and as you get zero-filling
with building a "short" CTOR that last element should be a zero. I'm
not sure how to get at the "last" value, but I think that given
we create a "short" CTOR we need to check that all remaining
elements of the VLA vector are actually encoded as zeros?
I hope Richard can give us a hint at what's the correct thing to do
there. In principle creating a fixed-size vector CTOR and then
using a VEC_PERM to fill that with zeros like VEC_PERM <fixed-length,
VLA zero, { 0, 1, 2, 3, 4, 5, 6 ... }> would be that, or simply
a V_C_E of the fixed-length vector to the VLA vector.
Richard.
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)