On Wed, 26 Nov 2025, Christopher Bazley wrote: > > On 24/11/2025 13:14, Richard Biener wrote: > > On Mon, 17 Nov 2025, Christopher Bazley wrote: > > > >> On 14/11/2025 11:15, Richard Sandiford wrote: > >>> Christopher Bazley <[email protected]> writes: > >>>>>> Agreed. The only valid situations seem to be: > >>>>>> > >>>>>> (1) a duplicate of a single zero, where: > >>>>>> > >>>>>> npatterns == nelts_per_pattern == encoded_nelts == 1 > >>>>>> > >>>>>> and the only encoded value is zero > >>>>>> > >>>>>> (2) the combination of: > >>>>>> > >>>>>> - nelts_per_pattern == 2 > >>>>>> - multiple_p (TYPE_VECTOR_SUBPARTS (type), npatterns) > >>>>>> - the second half of the encoded elements are all zeros > >>>>>> > >>>>>> But these combinations would not come about by chance. The caller > >>>>>> would have to take steps to ensure that they're true. So rather > >>>>>> than check for these relatively complex conditions, it might > >>>>>> be clearer to add a new gimple_build interface that explicitly > >>>>>> fills with zeros, using a normal array (instead of a > >>>>>> tree_vector_builder) for the explicitly-initialised elements. > >>>>> Would a new gimple_build_*_with_zeros function remove the need for > >>>>> vect_create_constant_vectors to pad with zeros at all? > >>>>> > >>>>> The design of vect_create_constant_vectors seems to be heavily built > >>>>> around use of a tree_vector_builder. I'm a bit reluctant to do > >>>>> anything that would require significant refactoring of > >>>>> vect_create_constant_vectors, or that would require this seemingly > >>>>> rather ordinary case to be treated specially. > >>> The current code is built for the normal VLA loop case, where the > >>> sequence of scalar constants needs to be repeated to fill a vector. > >>> For example, in: > >>> > >>> for (int i = 0; i < 100; ++i) > >>> { > >>> x[i*2] += 1; > >>> x[i*2 + 1] += 2; > >>> } > >>> > >>> we need { 1, 2, 1, 2, 1, 2, ... }. > >>> > >>> We can't do that filling explicitly at compile-time because we don't > >>> know how many copies are needed -- that depends on the runtime vector > >>> length. So instead we use a tree_vector_builder that encodes { 1, 2 } > >>> and says that the pattern needs to be repeated to fill a vector. > >>> > >>> This also works for fixed-length loop vectorisation because, in the > >>> general case, filling is needed there too. We could of course do the > >>> filling explicitly at compile time, but it would be somewhat wasted > >>> effort, since the resulting constant would be canonicalised back to > >>> the "{ 1, 2 } repeating" encoding. > >>> > >>> If you want to do something different for BB SLP then I think it makes > >>> sense that there is some difference in the way that the constant is > >>> constructed. It doesn't need to be a big difference. tree_vector_builder > >>> inherits from auto_vec, so it would be possible to create a new > >>> gimple_build_* that takes a vec (or, better, an array_slice) and still > >>> share the current tree_vector_builder code in > >>> vect_create_constant_vectors. > >> I have an alternative to my original solution now, which doesn't require > >> modification of the gimple_build_vector function. Instead, I have added a > >> new > >> gimple_build_vector_with_zero_padding function. It: > >> > >> * Prepares a vector of constructor elements and find out whether all of the > >> element values are constant. > >> > >> * If all element values are constant then it returns a new VECTOR_CST node. > >> Any elements for which no value is supplied will be zero. > >> > >> * Otherwise, it builds a constructor for only those element values that the > >> caller provided, then assigns the result of that constructor to a temporary > >> object. > > In GIMPLE a CONSTRUCTOR node has not mentioned elements zero-filled > > auto-magically. So iff you assume that the target can create a VLA > > vector with a n-element prefix (with n <= lower_bound (nunits)) then > > you shouldn't need to do anything special. > > If it is sufficient to build a constructor comprising only the lower elements > of known non-constant value, then the existing gimple_build_vector already > does that with only the minor modification to use the lower bound that was in > my original patch set. > > Should I therefore delete the gimple_build_vector_with_zero_padding function? > I only created it because I thought that was what you and Richard Sandiford > wanted. He suggested '...add a new gimple_build interface that explicitly > fills with zeros...', so that's what I did.
I think that, for non-constant vectors, you should simply build a CTOR directly and not go via gimple_build_vector. > > Of course this requires the targets vec_init to do the heavy lifting > > as IIRC there's no fallback in RTL expansion for VLA vector CONSTRUCTORs. > > > > But if code quality is awful why would we want to do this at all? > > In general I'd expect N operations to construct a vector with N > > non-zero leading elements. > > > > Richard. > > The generated code quality is much better since I made a change yesterday to > handle variable-length vector types in store_constructor, which I have not yet > pushed to the mailing list. Previously, that function used a fallback path of > calling store_constructor_field upon discovering that the number of subparts > in the vector type was not a constant multiple of the number of subparts in > the element type. > > For example, yesterday's change allows GCC to generate the following AArch64 > assembly language output for the tail of a reduction in the slp_6 test: > > uaddv d31, p6, z31.b > uaddv d27, p6, z27.b > uaddv d26, p6, z26.b > movi d30, #0 > insr z30.b, b26 > insr z30.b, b27 > insr z30.b, b31 > add z25.b, z25.b, z30.b > > instead of the following output (with predicated tails for basic block SLP > vectorization but without this change): > > addvl x0, sp, #2 > movi d0, #0 > st1b z0.b, p6, [sp, #2, mul vl] > uaddv d27, p6, z27.b > uaddv d26, p6, z26.b > uaddv d25, p6, z25.b > str b27, [x0] > addvl x0, sp, #1 > add x0, x0, 1 > ptrue p7.b, vl3 > ld1b z0.b, p6/z, [sp, #2, mul vl] > st1b z0.b, p6, [sp, #1, mul vl] > str b26, [x0] > ld1b z0.b, p6/z, [sp, #1, mul vl] > st1b z0.b, p6, [sp] > str b25, [sp, 2] > ld1b z0.b, p6/z, [sp] > add z28.b, z28.b, z0.b > st1b z28.b, p7, [x1] > addvl sp, sp, #3 > > or the original assembly language output (with neither predicated tails for > basic block SLP vectorization nor this change): > > uaddv d31, p6, z31.b > fmov x0, d31 > uaddv d31, p6, z26.b > add w6, w6, w0 > fmov x0, d31 > uaddv d31, p6, z27.b > add w5, w5, w0 > fmov x0, d31 > add w4, w4, w0 > > >> * If there are no implicitly-zero trailing elements then it returns the > >> value > >> of the temporary object. > >> > >> * Otherwise, it builds a mask that exposes only those lanes of the > >> destination > >> vector type for which the caller provided values... > >> > >> * ... then copies unmasked elements from the temporary object to the > >> destination vector and assigns zero to masked elements. > >> > >> (The temporary object is needed because a non-constant constructor isn’t > >> valid > >> for use in a ternary operation.) > >> > >> Unfortunately, this approach currently produces less efficient output, for > >> example: > >> > >> ptrue p15.b, vl16 ; *** :o( > >> .... > >> ptrue p7.b, vl3 > >> .... > >> movi d25, #0 ; *** :o( > >> .... > >> ld1b z1.b, p6/z, [sp] > >> sel z25.b, p15, z1.b, z25.b ; *** :o( > >> add z28.b, z28.b, z25.b > >> st1b z28.b, p7, [x1] > >> > >> > -- Richard Biener <[email protected]> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
